Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the fastest way of loading images?

Tags:

pytorch

I have about 200,000 high resolution images, and loading such high quality images every time is time consuming. Preloading all images might occupy too much memory. How about saving each images into .npz file format and loading .npz instead of .jpg? Would it be boosting speed?

like image 536
nashory Avatar asked Dec 05 '17 00:12

nashory


People also ask

Do Compressed images load faster?

During our testing, we found that image compression improved loading times around 10% in most cases. This is at the lower end of the scale, though. During testing, we saw even better results, all the way up to a 24.29% performance increase.

Do smaller images load faster?

Make sure that you scale down the original image to these dimensions BEFORE sending it to the browser. The resized images are much smaller than the original image and will load much faster than the original image.


1 Answers

You do not need to load all the image to memory at once. Considering also that we need to do data augmentation on the dataset during model training, it is impossible to load all images.

In PyTorch, you can use Dataset to store your training and validation set. The Dataset class has a parameter transforms(e.g., Scale, RandomCrop, etc.), which is used to transform the training image on the fly during training. Several ready-made dataset are also provided by torchvision package, see here.

Basic methold

PyTorch's builtin Dataloader has a num_worker, which is used to control how many subprocess you use for loading the data. Since your dataset is not so huge, that would be enough for your need. About how to set the appropriate number of worker, see here.

More references

There are discussion on PyTorch forum on fast image loading, use post1 and post2 as a start.

like image 90
jdhao Avatar answered Nov 05 '22 14:11

jdhao