Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load a list of numpy arrays to pytorch dataset loader?

I have a huge list of numpy arrays, where each array represents an image and I want to load it using torch.utils.data.Dataloader object. But the documentation of torch.utils.data.Dataloader mentions that it loads data directly from a folder. How do I modify it for my cause? I am new to pytorch and any help would be greatly appreciated. my numpy array for a single image looks something like this. The image is RBG image.

[[[ 70  82  94]   [ 67  81  93]   [ 66  82  94]   ...,    [182 182 188]   [183 183 189]   [188 186 192]]   [[ 66  80  92]   [ 62  78  91]   [ 64  79  95]   ...,    [176 176 182]   [178 178 184]   [180 180 186]]   [[ 62  82  93]   [ 62  81  96]   [ 65  80  99]   ...,    [169 172 177]   [173 173 179]   [172 172 178]]   ...,  
like image 717
deepayan das Avatar asked Jun 08 '17 07:06

deepayan das


People also ask

Can I use Numpy array in PyTorch?

To input a NumPy array to a neural network in PyTorch, you need to convert numpy. array to torch. Tensor .

What is TensorDataset?

The Dataset class is an abstract class that is used to define new types of (customs) datasets. Instead, the TensorDataset is a ready to use class to represent your data as list of tensors.

What is the difference between a PyTorch Dataset and a PyTorch DataLoader?

Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.


1 Answers

I think what DataLoader actually requires is an input that subclasses Dataset. You can either write your own dataset class that subclasses Datasetor use TensorDataset as I have done below:

import torch import numpy as np from torch.utils.data import TensorDataset, DataLoader  my_x = [np.array([[1.0,2],[3,4]]),np.array([[5.,6],[7,8]])] # a list of numpy arrays my_y = [np.array([4.]), np.array([2.])] # another list of numpy arrays (targets)  tensor_x = torch.Tensor(my_x) # transform to torch tensor tensor_y = torch.Tensor(my_y)  my_dataset = TensorDataset(tensor_x,tensor_y) # create your datset my_dataloader = DataLoader(my_dataset) # create your dataloader 

Works for me. Hope it helps you.

like image 144
mbpaulus Avatar answered Sep 18 '22 14:09

mbpaulus