Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Keras halt at the first epoch when I attempt to train it using fit_generator?

I'm using Keras to fine tune an existing VGG16 model and am using a fit_generator to train the last 4 layers. Here's the relevant code that I'm working with:

# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(vgg_conv)

# Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(5, activation='softmax'))

# Show a summary of the model. Check the number of trainable params
model.summary()
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

#Change the batchsize according to the system RAM
train_batchsize = 100
val_batchsize = 10

train_dir='training_data/train'
validation_dir='training_data/validation'

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(image_size1, image_size2),
    batch_size=train_batchsize,
    class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(image_size1, image_size2),
    batch_size=val_batchsize,
    class_mode='categorical',
    shuffle=False)

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

# Train the model
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_generator.samples/train_generator.batch_size,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=validation_generator.samples/validation_generator.batch_size,
    verbose=1)

The issue is that when I run my script to train the model, it works fine until the actual training begins. Here, it gets stuck at epoch 1/30.

Layer (type)                 Output Shape              Param #
=================================================================
vgg16 (Model)                (None, 15, 20, 512)       14714688
_________________________________________________________________
flatten_1 (Flatten)          (None, 153600)            0
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              157287424
_________________________________________________________________
dropout_1 (Dropout)          (None, 1024)              0
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 5125
=================================================================
Total params: 172,007,237
Trainable params: 164,371,973
Non-trainable params: 7,635,264
_________________________________________________________________
Found 1989 images belonging to 5 classes.
Found 819 images belonging to 5 classes.
Epoch 1/30

This is no good unfortunately. I looked around online and I believe that the problem is in using fit_generator. There's something about the code for fit_generator in Keras being buggy. However, most of the other people experiencing issues with the epochs end up getting stuck on later epochs (ex. somebody wants to run it for 20 epochs and it halts on epoch 19/20).

How would I go about fixing this issue? This is my first time doing deep learning so I'm incredibly confused and would appreciate any help. Do I just need to move to using model.fit()?

like image 888
sjgandhi2312 Avatar asked Mar 06 '23 00:03

sjgandhi2312


2 Answers

You have to pass a valid integer number to fit_generator() as steps_per_epoch and validation_steps parameters. So you can use as follows:

history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_generator.samples//train_generator.batch_size,
    epochs=30,
    validation_data=validation_generator, validation_steps=validation_generator.samples//validation_generator.batch_size,
    verbose=1)

The second factor I can see that your model has 165M trainable parameter which has huge memory consumption particularly coupled with a high batchsize. You should use images with lower resolution, note that in many case we can get better results with them.

like image 159
Geeocode Avatar answered Apr 08 '23 14:04

Geeocode


i have the same issue and resolve it after setting validation_steps=validation_size//batch_size

like image 36
KDLin Avatar answered Apr 08 '23 13:04

KDLin