How to asynchronously load and train batches to train a DeepLearning model?

Tags:

I have 3TB dataset and 64GB RAM and a 12 core CPU and one 12GB GPU. would like to train a deep learning model on this dataset. How do I have asynchronous load of batches and training of the model? I want to make sure disk load of data doesn't block training loop to be waiting for the new batch to load into memory.

I am not language dependent and the easiest library that can do this without friction wins but I prefer one of torch, pytorch, tensorflow.

286

asked May 12 '17 21:05

Morteza Shahriari Nia

2 Answers

We solved this problem in the way @mo-hossny described above (not "tied to the Imagenet folder structure") with Keras (tensorflow backend) and described it in gory detail here.

A brief summary of that: most ML tutorials show a directory structure where the class of training (and test) examples is implied by the subdirectory. For instance, you might see subdirectories and files like data/train/cats/???.png and data/train/dogs/???.png, etc.

If instead you create a simple Pandas DataFrame to hold the unique id, class label and file path for each train/test sample, then you can shuffle this DataFrame at the start of each epoch, loop over it in mini-batches and use a generator to send each chunk to the GPU. In the background, the CPU is keeping the queue of chunks full, standing by to send each subsequent one to the GPU as soon as it finishes its current batch.

An example of such a DataFrame is:

df

       object_id   bi  multi                                    path
index                                                               
 0        461756  dog  white    /path/to/imgs/756/61/blah_461756.png
 1       1161756  cat  black   /path/to/imgs/756/61/blah_1161756.png
 2       3303651  dog  white   /path/to/imgs/651/03/blah_3303651.png
 3       3367756  dog   grey   /path/to/imgs/756/67/blah_3367756.png
 4       3767756  dog   grey   /path/to/imgs/756/67/blah_3767756.png
 5       5467756  cat  black   /path/to/imgs/756/67/blah_5467756.png
 6       5561756  dog  white   /path/to/imgs/756/61/blah_5561756.png
 7      31255756  cat   grey  /path/to/imgs/756/55/blah_31255756.png
 8      35903651  cat  black  /path/to/imgs/651/03/blah_35903651.png
 9      44603651  dog  black  /path/to/imgs/651/03/blah_44603651.png
10      49557622  cat  black  /path/to/imgs/622/57/blah_49557622.png
11      58164756  dog   grey  /path/to/imgs/756/64/blah_58164756.png
12      95403651  cat  white  /path/to/imgs/651/03/blah_95403651.png
13      95555756  dog   grey  /path/to/imgs/756/55/blah_95555756.png

I've included labels for binomial and multinomial versions of the problem do demonstrate that the same DataFrame and files can be used in different classification settings.

Once you have this going, the Keras generator code is pretty short and sweet:

train_generator = generator_from_df(df, batch_size, target_size)

where df is similar to my example above and the function generator_from_df() is defined here. It simply loops through the df in chunks of a given size; reads, normalized and concatenates the pixel data specified in the chunk's rows; and finally yields (hence the generator) the X (pixels) and Y (labels) data. The heart of it is very similar to:

i, j = 0, batch_size
for _ in range(nbatches):
    sub = df.iloc[i:j]
    X = np.array([
        (2 *
         (img_to_array(load_img(f, target_size=target_size))
          / 255.0 - 0.5))
        for f in sub.imgpath])
    Y = sub.target.values
    yield X, Y
    i = j
    j += batch_size
    count += 1

Note the references and code in the post: we aggregated helpful hints from others in the Keras pages and here on Stackoverflow.

111

answered Sep 22 '22 00:09

timehaven

If you don't want to be tied to the Imagenet folder structure you can develop your own data loader pretty much in every framework. A pytorch sample code is available at https://stackoverflow.com/a/45102798/7387369. It loads the next batch while training. Set num_workers to number of threads to run in parallel.

answered Sep 23 '22 00:09

Mo Hossny

Related questions
                            
                                What should a C function called from Lua that pushes a table return?
                            
                                Loading a file and returning its environment
                            
                                Cairo: How to create path-warped text?
                            
                                Lua - Printing ( ♡ ) character after parsing of JSON
                            
                                Call a function by an external application without opening a new instance of Matlab
                            
                                fast retrieval of lua objects from C/C++
                            
                                How to declare variables with a type in Lua
                            
                                LUA_MULTRET not working as expected
                            
                                Does Lua optimize concatenating with an empty string?
                            
                                Date after number of days in lua scripting
                            
                                Anaconda (Python) + Cmder (Clink) on Windows - Unify Dueling Custom Prompts
                            
                                Lua table length function override not working
                            
                                More elegant, simpler way to convert code point to UTF-8
                            
                                gsub in Lua. Unable to replace pattern
                            
                                How to remove a string from a table
                            
                                Why does my code only print nil once?
                            
                                Child class constructor method in Lua
                            
                                Order of operations (== vs. not)
                            
                                Is it possible to overload operators for strings?
                            
                                Date formatting in Lua

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to asynchronously load and train batches to train a DeepLearning model?

Tags:

tensorflow

deep-learning

pytorch

torch

lua

Morteza Shahriari Nia

People also ask

2 Answers

timehaven

Mo Hossny

Recent Activity

Donate For Us