How to load entire dataset from the DataLoader? I am getting only one batch of dataset. This is my code <pre class="prettyprint"><code>dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size=64) images, labels = next(iter(dataloader)) </code></pre>

You can set <code>batch_size=dataset.__len__()</code> in case dataset is torch <code>Dataset</code>, else something like <code>batch_szie=len(dataset)</code> should work. Beware, this might require a lot of memory depending upon your dataset.

Another option would be to get the entire dataset directly, without using the dataloader, like so : <pre class="prettyprint"><code>images, labels = dataset[:] </code></pre>

How to get entire dataset from dataloader in PyTorch

Tags:

python

pytorch

dataloader

How to load entire dataset from the DataLoader? I am getting only one batch of dataset.

This is my code

dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size=64)
images, labels = next(iter(dataloader))

438

asked Aug 07 '19 04:08

Aakanksha W.S

3 Answers

You can set batch_size=dataset.__len__() in case dataset is torch Dataset, else something like batch_szie=len(dataset) should work.

Beware, this might require a lot of memory depending upon your dataset.

108

answered Oct 16 '22 23:10

asymptote

I'm not sure if you want to use the dataset somewhere else than network training (like inspecting the images for example) or want to iterate over the batches during training.

Iterating through the dataset

Either follow Usman Ali's answer (which might overflow) your memory or you could do

for i in range(len(dataset)): # or i, image in enumerate(dataset)
    images, labels = dataset[i] # or whatever your dataset returns

You are able to write dataset[i] because you implemented __len__ and __getitem__ in your Dataset class (as long as it's a subclass of the Pytorch Dataset class).

Getting all batches from the dataloader

The way I understand your question is that you want to retrieve all batches to train the network with. You should understand that iter gives you an iterator of the dataloader (if you're not familiar with the concept of iterators see the wikipedia entry). next tells the iterator to give you the next item.

So, in contrast to an iterator traversing a list the dataloader always returns a next item. List iterators stop at some point. I assume that you have something like a number of epochs and a number of steps per epoch. Then your code would look like this

for i in range(epochs):
    # some code
    for j in range(steps_per_epoch):
        images, labels = next(iter(dataloader))
        prediction = net(images)
        loss = net.loss(prediction, labels)
        ...

Be careful with next(iter(dataloader)). If you wanted to iterate through a list this might also work because Python caches objects but you could end up with a new iterator every time that starts at index 0 again. To avoid this take out the iterator to the top, like so:

iterator = iter(dataloader)
for i in range(epochs):
    for j in range(steps_per_epoch):
        images, labels = next(iterator)

answered Oct 16 '22 21:10

Florian Blume

Another option would be to get the entire dataset directly, without using the dataloader, like so :

images, labels = dataset[:]

answered Oct 16 '22 23:10

Jean B.

Related questions
                            
                                Convert multiple boolean columns which names start with string `abc_` at once into integer dtype
                            
                                Distinguish button_press_event from drag and zoom clicks in matplotlib
                            
                                writing list of list into a COLUMNED .txt file via python
                            
                                Django REST Framework filter multiple fields
                            
                                Unpivot multiple columns with same name in pandas dataframe
                            
                                How to determine if one list contains another? [duplicate]
                            
                                How to filter the data use equal or greater than condition in the url?
                            
                                create() takes 1 positional argument but 2 were given?
                            
                                How to get size of filtered objectsCollection in boto3
                            
                                Syntax error in ternary if-else statement
                            
                                How to flip image with opencv and python( without cv2.flip)
                            
                                pandas change dtypes only columns of float64
                            
                                What is type <U12?
                            
                                Why do I need both condition branches for the rreverse function?
                            
                                Don't understand this AttributeError: module 'turtle' has no attribute 'Turtle' [duplicate]
                            
                                Tensorflow Python 3.7
                            
                                How to install and use basemap on Google Colab?
                            
                                remove prefix in all column names
                            
                                Generate random timeseries data with dates
                            
                                Extracting particular characters/ text from DataFrame column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With