I am doing regression using caffe, and my test.txt
and train.txt
files are like this:
/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272
My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the 'test.txt'
file caffe only
recognizes
a total of 1 images
which is wrong.
But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives
a total of 2 images
implying that the float labels are responsible for the problem.
Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.
EDIT For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to @Shai.
When using the image dataset input layer (with either lmdb
or leveldb
backend) caffe only supports one integer label per input image.
If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.
In python you can use h5py
package to create hdf5 files.
import h5py, os
import caffe
import numpy as np
SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
# transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
Once you have all h5
files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt
:
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset
tool.
At closer inspection, it seems like caffe stores data of type Datum
in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.
Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):
import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):
db = lmdb.open(path_dst, map_size=int(1e12))
with db.begin(write=True) as in_txn:
for idx, x in enumerate(scalars):
content_field = np.array([x])
# get shape (1,1,1)
content_field = np.expand_dims(content_field, axis=0)
content_field = np.expand_dims(content_field, axis=0)
content_field = content_field.astype(float)
dat = caffe.io.array_to_datum(content_field)
in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
db.close()
I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.
First read the image as unsigned ints:
img = np.array(Image.open('images/' + image_name))
Then change the channel order from RGB to BGR:
img = img[:, :, ::-1]
Finally, switch from Height x Width x Channels to Channels x Height x Width:
img = img.transpose((2, 0, 1))
Merely changing the shape will scramble your image and ruin your data!
To read back the image:
with h5py.File(h5_filename, 'r') as hf:
images_test = hf.get('images')
targets_test = hf.get('targets')
for i, img in enumerate(images_test):
print(targets_test[i])
from skimage.viewer import ImageViewer
viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
viewer.show()
Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With