Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert data to leveldb for caffe

I have a bunch of 2D data matrices in Matlab (no image data, but some single precision data).

Does anyone know how to convert 2D matlab matrices to the leveldb format which is required by caffe to train a custom neural network?

I already did the tutorial on how to train on images (using the imagenet architecture) and on mnist (digit recognition dataset). However in the latter example they didn't show how to create the respective database. In the tutorial the database was already provided.

like image 701
mcExchange Avatar asked Jun 05 '15 10:06

mcExchange


1 Answers

I still don't know to create a leveldb database of my 2D data matrices for usage in caffe but I finally solved by problem:
I ended up using Shai's proposal to convert the data to HDF5 format. It is quite easy to read and write HDF5 databases in Matlab. You just have to use the functions hdf5info(),h5read(),h5create() and h5write() which are already implemented in Matlab.

Example:
- Change the data type in your caffe prototxt file to "hdf5layer", like this:

name: "LeNet"
layer {
  name: "mnist"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "/path/to/your/database/myMnist_train.txt"
    batch_size: 64
  }
}

Use Matlab to create HDF5 databases:
- Caffe: Your input training data has to be a 4-D matrix where the last two dimensions are equal to the size of your 2D input data matrix in matlab.
- Example: Take a 2d matrix (image or single precision data) of size 54x24 (#rows x cols)
- -> transpose it, and stack it into a 24x54x1xN matrix, where N is the number of 2d matrices (training samples)
- The labels are in a 1xN row vectors in matlab.
- Now create your hdf5 database:

h5create(['train.h5'],'/data',[24 54 1 length(trainLabels)]);
h5create(['train.h5'],'/label',[1 length(trainLabels)]);
h5write(['train.h5'],'/data',trainData);
h5write(['train.h5'],'/label',trainLabels);
  • As you can see, caffe expects a hdf5 database with the variables "data" and "label"
  • Reading a database:
    Use hdf5info(filename) to get the dataset names inside a hdf5 database. Then use data = h5read(filename,dataset) to read the dataset
like image 166
mcExchange Avatar answered Oct 22 '22 23:10

mcExchange