I have a dataset of images that have multiple labels; There are 100 classes in the dataset, and each image has 1 to 5 labels associated with them.
I'm following the instruction in the following URL:
https://github.com/BVLC/caffe/issues/550
It says that I need to generate a text file listing the images and its labels as in
/home/my_test_dir/picture-foo.jpg 0 /home/my_test_dir/picture-foo1.jpg 1
In my case, since I have multi-label images, does it work to simply add labels as in following?
/home/my_test_dir/picture-foo.jpg 0 2 5 /home/my_test_dir/picture-foo1.jpg 1 4
I have a feeling that it's probably not going to be that simple, and if I'm right, in what step and how should I integrate the multi-label-ness of the dataset in the process of setting up Caffe?
I believe Shai's answer is no longer up-to-date. Caffe supports multi-label/matrix ground truth for HDF5 and LMDB formats. The python snippet in this github comment demonstrates how to construct multi-label LMDB ground truth (see Shai's answer for HDF5 format). Different from the construction of single-label image datasets, an lmdb is constructed for the images while a second separate lmdb is constructed for the multi-label ground truth data. The snippet deals with spatial multi-label ground truth useful for pixel-wise labeling of images.
The order in which data is written to the lmdb is crucial. The order of the ground truth must match the order of the images.
Loss layers such as SOFTMAX_LOSS, EUCLIDEAN_LOSS, SIGMOID_CROSS_ENTROPY_LOSS also support multi-label data. However, the Accuracy layer is still limited to single-label data. You might want to follow this github issue to keep track of when this feature is added to Caffe.
caffe supports multilabel. You can put the labels into n-hot vectors e.g. [0,1,1,0,0,1,...] . You need to reshape the labels to n*k*1*1 tensors and use sigmoid cross-entropy or euclidean, not softmax (which forces sum(outputs)=1 )
AFAIK, current Caffe version does not support lmdb/leveldb datasets for images with multilabels. However, you can (and probably should) prepare your inputs in HDF5 format. Caffe HDF5 input layer is much more flexible and will allow you to have multiple labels per input.
This answer gives a brief description of how to create HDF5 input for caffe.
Another issue you must address is the fact that you are interested not only in multi-label per image, but also with varying number of labels per image. How do you define your loss per image, per label? it might be the case that you would have to write your own loss layer.
There are some loss layers that supports "ignore label": that is, if a specific input label is assigned to the image, no loss is computed for the respective image. See, e.g. AccuracyLayer and SoftmaxWithLossLayer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With