-
-
Notifications
You must be signed in to change notification settings - Fork 40
IO speed
During training, loading data from your storage device or decoding the images may be the bottlenecks during training, especially if using datasets like DIV2K with 2K resolution images. Some alternatives to improve IO speed are:
- Put your training data in SSD (Solid-State Drive), if your machine has one. Reading data from SSD is much faster than reading from HDD (Hard Disk Drive).
- Pre-crop the images to sub-images. During training, the data loader crops the images and only uses patches of size HR_size X HR_size, as configured in the options (for example, 128x128). Given this, there is no need to read the whole images, which takes more resources and time to process in the dataloader. For this reason, you can use
extract_subimgs_single.py
or any other mechanism to crop the large images to sub-images tiles. For example, you can crop the DIV2K images to tiles of 480x480 sub-images with a sliding window of step = 240 using the script. - (Optional) Use lmdb format. Reading raw images from your storage devices consumes a lot of CPU resources due to the decoding for each image. lmdb is an alternative that pre-decodes the images and store them in single indexed databases that can be read during training. Note that not all the dataloaders support using lmdb, so it's worth checking before creating the databases.
- (Optional) If lmdb is not an option, another alternative to test is to decode the images to numpy arrays with cv2 and store them as .npy files, which can then be loaded directly as numpy arrays without decoding first.
These options can be combined (except 3 and 4, which are exclusive), each contributing to better IO performance. Also important to note is that when using on-the-fly augmentations, depending on the type of augmentation there can be some performance impact, but in some cases it's not noticeable.
During training a td
variable is show on the console and log files representing the dataloader time, which can help in debugging the IO behavior of the data.
A simple excercise to compare the options of using the same images as regular PNGs, decoded NPYs and LMDB can be made and looping through a dataset of each type for multiple loops and averaging the time (in seconds). Using a dataset of 50 paired images of dimensions 128x128 produce the following results on a HDD:
Metric | PNG | NPY | LMDB | - | NPY % PNG | LMDB % PNG |
---|---|---|---|---|---|---|
Average | 0.171707s | 0.136925s | 0.127708s | - | 79.74369% | 74.37564% |
STD | 0.019705s | 0.014681s | 0.013691s | - | 8.550096% | 7.973582% |
With NPY being about 20% faster than using the regular images and LMDB 25% faster.
- Put the training images into a folder.
- Run the script
codes/scripts/create_lmdb.py
. It takes a required argumentimages_path
which is the folder with the training images and an optionallmdb_path
which is a directory ending in.lmdb
which will be set up as the dataroot in the options training file. This process has to be repeated for each dataroot that will be used during training (for example,dataroot_HR
anddataroot_LR
). The resulting directory will have the following file structure:
dataset.lmdb
├── data.mdb
├── lock.mdb
├── meta_info.txt
- In the configuration files (
.yml
or.json
) files, write the lmdb paths in the corresponding dataroots, for example:
dataroot_HR: '../datasets/train/hr_dataset.lmdb'
dataroot_LR: '../datasets/train/lr_dataset.lmdb'
Note: it is currently required to set n_workers: 0
in the dataloader options to use lmdb (at least on Windows), else there can be a PermissionError
due to multiple processes accessing the image database.