Keras have very little info about keras.utils.Sequence, actually the only reason I want to derive my batch generator from keras.utils.Sequence
is that I want to not to write thread pool with queue by myself, but I'm not sure if it's best choice for my task, here is my questions:
__len__
return if I have random generator and I don't
have any predefined 'list' with samples. keras.utils.Sequence
should be used with fit_generator
, I'm interested in
max_queue_size
,workers
, use_multiprocessing
, shuffle
parameters mostly.keras. utils. Sequence and define the methods: __init__ , __getitem__ , __len__ . In addition, you can define the method on_epoch_end , which is called at the end of each epoch and is usually used to shuffle the sample indexes. There is an example in the link you gave Tensorflow Sequence.
Keras provides numpy utility library, which provides functions to perform actions on numpy arrays.
len
batches from the Sequence. steps_per_epoch=len(generator)
or steps_per_epoch=None
.
max_queue_size
: any value, this will load batches that will be waiting in memory until their turn to get into the model workers
: any value, this will be the number of parallel "threads" (forgive me if the name is not precise) that will be loading batches use_multiprocessing
: I don't know this one. It was not necessary for me and the only time I tried it was buggy enough to freeze my machine shuffle
: From the documentation: Boolean. Whether to shuffle the order of the batches at the beginning of each epoch. Only used with instances of Sequence (keras.utils.Sequence). Has no effect when steps_per_epoch is not None.Advantages of Sequence
over a regular generator:
With sequence, it's possible to keep track of which batches were already taken, which batches are sent to which thread for loading, and there will never be a conflict because it's based on indices.
With generator, parallel processing will lose track of what batches were already taken or not because threads don't talk to each other and there is no other option than yielding batch by batch sequentially.
Advantages of generators and sequences over a loop
In a loop, you will "wait for batch load", "wait for model training", "wait for batch load", "wait for model training".
With fit_generator
, batches will be loaded "while" the model is training, you have both things happening simultaneously.
For very simple generators, there won't be a big impact. For complex generators, augmentators, big image loaders, etc., the generation time is very significant and may severely impact your speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With