I have about 200,000 high resolution images, and loading such high quality images every time is time consuming. Preloading all images might occupy too much memory. How about saving each images into .npz file format and loading .npz instead of .jpg? Would it be boosting speed?
During our testing, we found that image compression improved loading times around 10% in most cases. This is at the lower end of the scale, though. During testing, we saw even better results, all the way up to a 24.29% performance increase.
Make sure that you scale down the original image to these dimensions BEFORE sending it to the browser. The resized images are much smaller than the original image and will load much faster than the original image.
You do not need to load all the image to memory at once. Considering also that we need to do data augmentation on the dataset during model training, it is impossible to load all images.
In PyTorch, you can use Dataset
to store your training and validation set. The Dataset
class has a parameter transforms
(e.g., Scale, RandomCrop, etc.), which is used to transform the training image on the fly during training. Several ready-made dataset are also provided by torchvision
package, see here.
PyTorch's builtin Dataloader
has a num_worker
, which is used to control how many subprocess you use for loading the data. Since your dataset is not so huge, that would be enough for your need. About how to set the appropriate number of worker, see here.
There are discussion on PyTorch forum on fast image loading, use post1 and post2 as a start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With