Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fine-tune a keras model with existing plus newer classes?

Good day!

I have a celebrity dataset on which I want to fine-tune a keras built-in model. SO far what I have explored and done, we remove the top layers of the original model (or preferably, pass the include_top=False) and add our own layers, and then train our newly added layers while keeping the previous layers frozen. This whole thing is pretty much like intuitive.

Now what I require is, that my model learns to identify the celebrity faces, while also being able to detect all the other objects it has been trained on before. Originally, the models trained on imagenet come with an output layer of 1000 neurons, each representing a separate class. I'm confused about how it should be able to detect the new classes? All the transfer learning and fine-tuning articles and blogs tell us to replace the original 1000-neuron output layer with a different N-neuron layer (N=number of new classes). In my case, I have two celebrities, so if I have a new layer with 2 neurons, I don't know how the model is going to classify the original 1000 imagenet objects.

I need a pointer on this whole thing, that how exactly can I have a pre-trained model taught two new celebrity faces while also maintaining its ability to recognize all the 1000 imagenet objects as well.

Thanks!

like image 754
Syed Ali Hamza Avatar asked Dec 31 '22 15:12

Syed Ali Hamza


1 Answers

CNN's are prone to forgetting the previously learned knowledge when retrained for a new task on a novel domain and this phenomenon is often called catastrophic forgetting, which is an active and challenging research domain.

Coming to the point, one obvious way to enable a model to classify new classes along with old classes is to train from scratch on the accumulated (old+new) dataset (which is time consuming.)

In contrast, several alternative approaches have been proposed in the literature of (class-incremental) continual learning to tackle this scenario in the recent years:

  1. Firstly, you can use a small subset of the old dataset along with the new dataset to train your new model, refered as rehearsal-based approach. Note that you can train a GAN to generate pseudo samples of old classes instead of storing a subset of raw samples. As depicted in the figure, while training, distillation loss is used to mimic the prediction of old model (weight is frizzed) to the new model and it helps to avoid forgetting old knowledge: enter image description here
  2. Secondly, as the contributions of each neuron in a model are not equal, while training the new model you may instead only update neurons that are less important for old classes so that we can retain old knowledge. You can check out the Elastic Weight Consolidation (EWC) paper for more details.
  3. Thirdly, you can grow your model dynamically to extract features that are specific for new classes without harming the weights that are important for old classes. You can check out Dynamically Extendable Network (DEN) for more details.
like image 103
Kaushik Roy Avatar answered Jan 13 '23 15:01

Kaushik Roy