Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The speed between ImageDataLayer and LMDB data layer

Caffe support LMDB data layer and ImageDataLayer. Create LMDB database from some dataset require some time and a lot of space. In contrast, ImageDataLayer only use a txt file which is very convenient. My question is, is there big speed difference between these two kinds of layers? Thank you very much!

like image 365
kli_nlpr Avatar asked Feb 28 '16 09:02

kli_nlpr


People also ask

What is Lmdb in deep learning?

LMDB database is a Key/Value db (similar to HashMap in Java or dict in Python). In order to store 4D matrices you need to understand the convention Caffe uses to save images into LMDB format.

What is Lmdb format?

Lightning Memory-Mapped Database (LMDB) is a software library that provides an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages.


1 Answers

LMDB is designed for faster fetching of data from a given key value. Also the data is stored in uncompressed format, which makes it easy for the machine to just read the data and directly pass them to the GPU for processing.

In ImageDataLayer, we have to read the image details from the text file, and use OpenCV to read the image to memory. This uncompressing of image is computationally expensive.

But the best performance may not always be for the LMDB layer, it depends heavily on the configuration of the machine. Consider an example of 256 image batch size and the images of size 227x227x3. Also consider than you are using a very good GPU and a high end i8 processor machine. Here single image in LMDB format may occupy 151KB. A whole batch may occupy 37MB. If the GPU is able to perform 10 batches a second, the harddisk should have a speed of reading 370MB/s. If you are using a normal SATA or external harddisk, there will be bottlenecks on reading such large chunks of data due to the limits of the hard disk.

If caffe could not fetch data in the required speed, the bottleneck slows the whole training process even worse. At the same time, if you were reading 256 images and use multi-core version of OpenCV, the data prefetching may be handled more effectively than reading an LMDB.

The above case will not occur if you have stored the LMDB data on a SSD though!

like image 100
Anoop K. Prabhu Avatar answered Dec 27 '22 04:12

Anoop K. Prabhu