what is the fastest way of loading images?

Tags:

pytorch

I have about 200,000 high resolution images, and loading such high quality images every time is time consuming. Preloading all images might occupy too much memory. How about saving each images into .npz file format and loading .npz instead of .jpg? Would it be boosting speed?

536

asked Dec 05 '17 00:12

nashory

1 Answers

You do not need to load all the image to memory at once. Considering also that we need to do data augmentation on the dataset during model training, it is impossible to load all images.

In PyTorch, you can use Dataset to store your training and validation set. The Dataset class has a parameter transforms(e.g., Scale, RandomCrop, etc.), which is used to transform the training image on the fly during training. Several ready-made dataset are also provided by torchvision package, see here.

Basic methold

PyTorch's builtin Dataloader has a num_worker, which is used to control how many subprocess you use for loading the data. Since your dataset is not so huge, that would be enough for your need. About how to set the appropriate number of worker, see here.

More references

There are discussion on PyTorch forum on fast image loading, use post1 and post2 as a start.

answered Nov 05 '22 14:11

jdhao

Related questions
                            
                                No such operator torchvision::nms
                            
                                How can I trim / remove part of a Tensor to match the shape of another Tensor with PyTorch?
                            
                                NumPyro vs Pyro: Why is former 100x faster and when should I use the latter?
                            
                                How to use TPUs with PyTorch?
                            
                                Finding the top k matches in Pytorch
                            
                                How can I specify the flatten layer input size after many conv layers in PyTorch?
                            
                                Pytorch - inference all images and back-propagate batch by batch
                            
                                How Batch learning in Pytorch is performed?
                            
                                AttributeError: module 'torch' has no attribute 'hub'
                            
                                How to Multi-Head learning
                            
                                How can I build an LSTM AutoEncoder with PyTorch?
                            
                                Can you reverse a PyTorch neural network and activate the inputs from the outputs?
                            
                                How can I load a partial pretrained pytorch model?
                            
                                How to use the past with HuggingFace Transformers GPT-2?
                            
                                PyTorch how to compute second order Jacobian?
                            
                                What is hp_metric in TensorBoard and how to get rid of it?
                            
                                IDE autocomplete for pytorch
                            
                                multi-variable linear regression with pytorch
                            
                                Simple LSTM in PyTorch with Sequential module
                            
                                How can I install python modules in a docker image?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With