Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras - How are batches and epochs used in fit_generator()?

Tags:

I have a video of 8000 frames, and I'd like to train a Keras model on batches of 200 frames each. I have a frame generator that loops through the video frame-by-frame and accumulates the (3 x 480 x 640) frames into a numpy matrix X of shape (200, 3, 480, 640) -- (batch size, rgb, frame height, frame width) -- and yields X and Y every 200th frame:

import cv2
...
def _frameGenerator(videoPath, dataPath, batchSize):
    """
    Yield X and Y data when the batch is filled.
    """
    camera = cv2.VideoCapture(videoPath)
    width = camera.get(3)
    height = camera.get(4)
    frameCount = int(camera.get(7))  # Number of frames in the video file.

    truthData = _prepData(dataPath, frameCount)

    X = np.zeros((batchSize, 3, height, width))
    Y = np.zeros((batchSize, 1))

    batch = 0
    for frameIdx, truth in enumerate(truthData):
        ret, frame = camera.read()
        if ret is False: continue

        batchIndex = frameIdx%batchSize

        X[batchIndex] = frame
        Y[batchIndex] = truth

        if batchIndex == 0 and frameIdx != 0:
            batch += 1
            print "now yielding batch", batch
            yield X, Y

Here's how run fit_generator():

        batchSize = 200
        print "Starting training..."
        model.fit_generator(
            _frameGenerator(videoPath, dataPath, batchSize),
            samples_per_epoch=8000,
            nb_epoch=10,
            verbose=args.verbosity
        )

My understanding is an epoch finishes when samples_per_epoch samples have been seen by the model, and samples_per_epoch = batch size * number of batches = 200 * 40. So after training for an epoch on frames 0-7999, the next epoch will start training again from frame 0. Is this correct?

With this setup I expect 40 batches (of 200 frames each) to be passed from the generator to fit_generator, per epoch; this would be 8000 total frames per epoch -- i.e., samples_per_epoch=8000. Then for subsequent epochs, fit_generator would reinitialize the generator such that we begin training again from the start of the video. Yet this is not the case. After the first epoch is complete (after the model logs batches 0-24), the generator picks up where it left off. Shouldn't the new epoch start again from the beginning of the training dataset?

If there is something incorrect in my understanding of fit_generator please explain. I've gone through the documentation, this example, and these related issues. I'm using Keras v1.0.7 with the TensorFlow backend. This issue is also posted in the Keras repo.

like image 209
BoltzmannBrain Avatar asked Aug 13 '16 19:08

BoltzmannBrain


People also ask

What is steps per epoch in Fit_generator?

When you provide 's' steps per epoch , Each 's' step will have 'x' batches each consisting 'n' samples are sent to fit_generator, So, if you specify 5 steps per epoch, each epoch computes 'x' batches each consisting of 'n' samples 5 times, then the next epoch is started!

What is Fit_generator in keras?

fit() and keras. fit_generator() in Python are two separate deep learning libraries which can be used to train our machine learning and deep learning models. Both these functions can do the same task, but when to use which function is the main question.

What is the default batch size in the fit function of keras?

Number of samples per batch. If unspecified, batch_size will default to 32.

What is epoch and Steps_per_epoch mean?

steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.


2 Answers

After the first epoch is complete (after the model logs batches 0-24), the generator picks up where it left off

This is an accurate description of what happens. If you want to reset or rewind the generator, you'll have to do this internally. Note that keras's behavior is quite useful in many situations. For example, you can end an epoch after seeing 1/2 the data then do an epoch on the other half, which would be impossible if the generator status was reset (which can be useful for monitoring the validation more closely).

like image 183
yhenon Avatar answered Sep 29 '22 11:09

yhenon


You can force your generator to reset itself by adding a while 1: loop, that's how I proceed. Thus your generator can yield batched data for each epochs.

like image 25
Adrien G. Avatar answered Sep 29 '22 09:09

Adrien G.