I have a dataset of 180k images for which I try to recognize the characters on the images (License plate recognition). All of these license plates contain seven characters and 35 characters are possible, so the output vector y is of shape (7, 35). I therefore onehot-encoded every license plate label.
I applied the bottom of the EfficicentNet-B0 model (https://keras.io/api/applications/efficientnet/#efficientnetb0-function) together with a customized top, which is divided in 7 branches (because of seven characters per license plate). I used the weights of the imagenet and freezed the bottom layers of efnB0_model:
def create_model(input_shape = (224, 224, 3)):
input_img = Input(shape=input_shape)
model = efnB0_model (input_img)
model = GlobalAveragePooling2D(name='avg_pool')(model)
model = Dropout(0.2)(model)
backbone = model
branches = []
for i in range(7):
branches.append(backbone)
branches[i] = Dense(360, name="branch_"+str(i)+"_Dense_16000")(branches[i])
branches[i] = BatchNormalization()(branches[i])
branches[i] = Activation("relu") (branches[i])
branches[i] = Dropout(0.2)(branches[i])
branches[i] = Dense(35, activation = "softmax", name="branch_"+str(i)+"_output")(branches[i])
output = Concatenate(axis=1)(branches)
output = Reshape((7, 35))(output)
model = Model(input_img, output)
return model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
For training and validating the model I only use 10.000 training images and 3.000 validation images due to the big size of my model and the huge number of data which would make my training very, very slow.
I use this DataGenerator to feed batches to my model:
class DataGenerator(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
def __getitem__(self, idx):
batch_x = self.x[idx*self.batch_size : (idx + 1)*self.batch_size]
batch_x = np.array([resize(imread(file_name), (224, 224)) for file_name in batch_x])
batch_x = batch_x * 1./255
batch_y = self.y[idx*self.batch_size : (idx + 1)*self.batch_size]
batch_y = np.array(batch_y)
return batch_x, batch_y
I fit the model using this code:
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch = num_train_samples // 32,
validation_steps = num_val_samples // 32,
epochs = 10, workers=6, use_multiprocessing=True)
Now, after several epochs of training, I observed big differences regarding training accuracy and validation accuracy. I think, one reason for that is the small size of data. Which other factors influence this overfitting in my model? Do you think, there is something completely wrong with my code/model? Do you think the model is to big and complex as well or is it maybe due to the preprocessing of the data?
Note: I already experimented with Data Augmentation and tried the model without Transfer Learning. That leads to poor results on training AND validation data. So, is there anything what I could do additionally?

Are you sure that this is the correct approach to follow? The EfficientNet is a model created for image recognition, and your task demands the correct localization of 7 characters in one image, recognition of each one of them, and also demand to keep the order of the characters. Maybe an approach of detection + segmentation followed by recognition like in this medium post is more efficient (pun intended). Even though I think that this is very likely to be your real problem, I will try to answer your original question.
There's a very good guide here in Keras documentation on how to use EfficientNet for transfer learning. I will try to summarize some tips here.
From your question seems that you are not even doing the fine-tuning step which is essential for the network to learn the task better.
Now, after several epochs of training, I observed big differences regarding training accuracy and validation accuracy.
With several, you mean how many epochs? Because from the image you put in the question I think that the second complete epoch is too soon to infer that your model is overfitting. Also, from the code (10 epochs) and for the image you posted (20 epochs) I would say to train for more epochs, like 40.
Increase the dropout. Try some configurations like 30%, 40%, 50%.
Data Augmentation in practice will increase the number of samples that you have. However, you have 180K images and are only using 10K of images, data augmentation is good but when you have more images available try using them first. From that guide I mentioned, seems that with this model and Google colab is feasible to use more images to train. So, try to increase the train size. Still in the topic of DA, some transformations may be harmful to your task, like too much rotation or reflection since you are trying to recognize numbers and letters.
Reducing the batch size to 16 may provide more regularization which helps to fight overfitting. Speaking of regularization, try to apply regularization to the dense layers that you are adding.
After quickly reading the paper you linked, I reaffirm my point about the epochs since in the paper the results are shown for 100 epochs. Also from the charts in the paper, we can see it's not possible to confirm that the author did not had overfitting too. Additionally, the changes in the Xception network are not clear at all. Changing the input layer has an impact on the dimensions of all other layers because of the way that the Convolution operation work and this is not discussed in the paper. The operations performed to achieve that output dimension is not clear too. Besides what you did I would suggest using a pooling layer to get the output dimensions that you want. Finally, the paper doesn't explain how the positioning of the plate is guaranteed. I would try to get more details about this paper that you are trying to reproduce to be sure that you are not missing anything in your model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With