Keras have very little info about keras.utils.Sequence, actually the only reason I want to derive my batch generator from <code>keras.utils.Sequence</code> is that I want to not to write thread pool with queue by myself, but I'm not sure if it's best choice for my task, here is my questions: <ol> <li>What should <code>__len__</code> return if I have random generator and I don't have any predefined 'list' with samples. </li> <li>How <code>keras.utils.Sequence</code> should be used with <code>fit_generator</code>, I'm interested in <code>max_queue_size</code>,<code>workers</code>, <code>use_multiprocessing</code>, <code>shuffle</code> parameters mostly.</li> <li>What are other options avalible in keras?</li> </ol>

<ol> <li>Anything you want, considering that one epoch will get <code>len</code> batches from the Sequence. </li> <li>There is no secret, use it as any other generator, with the difference that you may do <code>steps_per_epoch=len(generator)</code> or <code>steps_per_epoch=None</code>. <ul> <li> <code>max_queue_size</code>: any value, this will load batches that will be waiting in memory until their turn to get into the model </li> <li> <code>workers</code>: any value, this will be the number of parallel "threads" (forgive me if the name is not precise) that will be loading batches </li> <li> <code>use_multiprocessing</code>: I don't know this one. It was not necessary for me and the only time I tried it was buggy enough to freeze my machine </li> <li> <code>shuffle</code>: From the documentation: Boolean. Whether to shuffle the order of the batches at the beginning of each epoch. Only used with instances of Sequence (keras.utils.Sequence). Has no effect when steps_per_epoch is not None.</li> </ul> </li> <li>I think this is it. If you want to thread the model itself, then you might want to read about multi GPU training, I guess. </li> </ol> Advantages of <code>Sequence</code> over a regular generator: With sequence, it's possible to keep track of which batches were already taken, which batches are sent to which thread for loading, and there will never be a conflict because it's based on indices. With generator, parallel processing will lose track of what batches were already taken or not because threads don't talk to each other and there is no other option than yielding batch by batch sequentially. Advantages of generators and sequences over a loop In a loop, you will "wait for batch load", "wait for model training", "wait for batch load", "wait for model training". With <code>fit_generator</code>, batches will be loaded "while" the model is training, you have both things happening simultaneously. For very simple generators, there won't be a big impact. For complex generators, augmentators, big image loaders, etc., the generation time is very significant and may severely impact your speed.

Clarification about keras.utils.Sequence

Tags:

python

multithreading

multiprocessing

deep-learning

keras

Keras have very little info about keras.utils.Sequence, actually the only reason I want to derive my batch generator from keras.utils.Sequence is that I want to not to write thread pool with queue by myself, but I'm not sure if it's best choice for my task, here is my questions:

What should __len__ return if I have random generator and I don't have any predefined 'list' with samples.
How keras.utils.Sequence should be used with fit_generator, I'm interested in max_queue_size,workers, use_multiprocessing, shuffle parameters mostly.
What are other options avalible in keras?

719

asked Dec 04 '18 19:12

mrgloom

1 Answers

Anything you want, considering that one epoch will get len batches from the Sequence.
There is no secret, use it as any other generator, with the difference that you may do steps_per_epoch=len(generator) or steps_per_epoch=None.
- max_queue_size: any value, this will load batches that will be waiting in memory until their turn to get into the model
- workers: any value, this will be the number of parallel "threads" (forgive me if the name is not precise) that will be loading batches
- use_multiprocessing: I don't know this one. It was not necessary for me and the only time I tried it was buggy enough to freeze my machine
- shuffle: From the documentation: Boolean. Whether to shuffle the order of the batches at the beginning of each epoch. Only used with instances of Sequence (keras.utils.Sequence). Has no effect when steps_per_epoch is not None.
I think this is it. If you want to thread the model itself, then you might want to read about multi GPU training, I guess.

Advantages of Sequence over a regular generator:

With sequence, it's possible to keep track of which batches were already taken, which batches are sent to which thread for loading, and there will never be a conflict because it's based on indices.

With generator, parallel processing will lose track of what batches were already taken or not because threads don't talk to each other and there is no other option than yielding batch by batch sequentially.

Advantages of generators and sequences over a loop

In a loop, you will "wait for batch load", "wait for model training", "wait for batch load", "wait for model training".

With fit_generator, batches will be loaded "while" the model is training, you have both things happening simultaneously.

For very simple generators, there won't be a big impact. For complex generators, augmentators, big image loaders, etc., the generation time is very significant and may severely impact your speed.

answered Oct 09 '22 21:10

Daniel Möller

Related questions
                            
                                Django max similarity (TrigramSimilarity) from ManyToManyField
                            
                                pandas plotting - x axis gets transformed to floats
                            
                                How does await give back control to the event loop during coroutine chaining?
                            
                                Python pandas: concat vertical and horizontal
                            
                                Manager / Container class, how to?
                            
                                Selenium with chromedriver doesn't start via cron
                            
                                Difference between setRootPath and setRootIndex in QFileSystemModel
                            
                                How can I attach documentation to members of a python enum?
                            
                                Shopify API Python Multiple Pictures upload with Python API
                            
                                python: How to trace function execution order in large project
                            
                                Is there an alternative to `difflib.get_close_matches()` that returns indexes (list positions) instead of a str list?
                            
                                Vectorized assignment in Numpy
                            
                                Strange behaviour of the loss function in keras model, with pretrained convolutional base
                            
                                round float values to interval limits / grid
                            
                                python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')
                            
                                Select rows of pandas dataframe from list, in order of list
                            
                                Sqlite with real "Full Text Search" and spelling mistakes (FTS+spellfix together)
                            
                                Jupyter notebook kernel not connecting
                            
                                Don't know how to uninstall unwanted Spacy installation, model
                            
                                How to activate a specific Python environment as part of my submission to Slurm?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With