Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the structure of torch dataset?

I am beginning to use torch 7 and I want to make my dataset for classification. I've already made pixel images and corresponding labels. However, I do not know how to feed those data to the torch. I read some codes from others and found out that they are using the dataset whose extension is '.t7' and I think it is a tensor type. Is it right? And I wonder how I can convert my pixel images(actually, I made them with Matlab by using MNIST dataset) into t7 extension compatible to the torch. There must be structure of dataset in the t7 format but I cannot find it (also for the labels too).

To sum up, I have pixel images and labels and want to convert those to t7 format compatible to the torch.

Thanks in advance!

like image 557
Minkyu Choi Avatar asked Jun 06 '16 12:06

Minkyu Choi


People also ask

How does PyTorch Dataset work?

PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset that allow you to use pre-loaded datasets as well as your own data. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

What is torch Utils data Dataset?

torch.utils.data.Sampler classes are used to specify the sequence of indices/keys used in data loading. They represent iterable objects over the indices to datasets.

What is map style datasets PyTorch?

A Map-style DataPipe is one that implements the __getitem__() and __len__() protocols, and represents a map from (possibly non-integral) indices/keys to data samples. This is a close equivalent of Dataset from the PyTorch core library.

What is a .t7 file?

t7 is no data format. This is just the file extension. The file is created when you call torch. save (...). Then the file can be loaded with torch.


1 Answers

The datasets '.t7' are tables of labeled Tensors. For example the following lua code :

if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
Readed_t7 = torch.load('cifar10-train.t7')
print(Readed_t7)

Will return through itorch :

{
  data : ByteTensor - size: 10000x3x32x32
  label : ByteTensor - size: 10000
}

Which means the file contains a table of two ByteTensor one labeled "data" and the other one labeled "label".

To answer your question, you should first read your images (with torchx for example : https://github.com/nicholas-leonard/torchx/blob/master/README.md ) then put them in a table with your Tensor of label. The following code is just a draft to help you out. It considers the case where : there are two classes, all your images are in the same folder and are ordered through those classes.

require 'torchx';

--Read all your dataset (the chosen extension is png)
files = paths.indexdir("/Path/to/your/images/", 'png', true)
data1 = {}
for i=1,files:size() do
   local img1 = image.load(files:filename(i),3)
   table.insert(data1, img1)
end

--Create the table of label according to 
label1 = {}
for i=1, #data1 do
    if i <= number_of_images_of_the_first_class then
        label1[i] = 1
    else
        label1[i] = 2
    end
end

--Reshape the tables to Tensors
label = torch.Tensor(label1)
data = torch.Tensor(#data1,3,16,16)
for i=1, #data1 do
    data[i] = data1[i]
end

--Create the table to save
Data_to_Write = { data = data, label = label }

--Save the table in the /tmp
torch.save("/tmp/Saved_Data.t7", Data_to_Write)

It should be possible to make a less hideous code but this one details all the steps and works with torch 7 and Jupyter 5.0.0 .

Hope it helps.

Regards

like image 90
Clement Bouvier Avatar answered Sep 20 '22 16:09

Clement Bouvier