Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I optimize neural network for image classification using pretrained models

Tags:

Thank you for viewing my question. I'm trying to do image classification based on some pre-trained models, the images should be classified to 40 classes. I want to use VGG and Xception pre-trained model to convert each image to two 1000-dimensions vectors and stack them to a 1*2000 dimensions vector as the input of my network and the network has an 40 dimensions output. The network has 2 hidden layers, one with 1024 neurons and the other one has 512 neurons.

Structure: image-> vgg(1*1000 dimensions), xception(1*1000 dimensions)->(1*2000 dimensions) as input -> 1024 neurons -> 512 neurons -> 40 dimension output -> softmax

However, using this structure I can only achieve about 30% accuracy. So my question is that how could I optimize the structure of my networks to achieve higher accuracy? I'm new to deep learning so I'm not quiet sure my current design is 'correct'. I'm really looking forward to your advice

like image 383
sunnwmy Avatar asked Sep 16 '17 03:09

sunnwmy


2 Answers

I'm not entirely sure I understand your network architecture, but some pieces don't look right to me.

There are two major transfer learning scenarios:

  • ConvNet as fixed feature extractor. Take a pretrained network (any of VGG and Xception will do, do not need both), remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. For example, in an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.

    Tip #1: take only one pretrained network.

    Tip #2: no need for multiple hidden layers for your own classifier.

  • Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.

    Tip #3: keep the early pretrained layers fixed.

    Tip #4: use a small learning rate for fine-tuning because you don't want to distort other pretrained layers too quickly and too much.

This architecture much more resembled the ones I saw that solve the same problem and has higher chances to hit high accuracy.

like image 92
Maxim Avatar answered Sep 23 '22 11:09

Maxim


There are couple of steps you may try when the model is not fitting well:

  1. Increase training time and decrease learning rate. It may be stopping at very bad local optima.
  2. Add additional layers that can extract specific features for the large number of classes.
  3. Create multiple two-class deep networks for each class ('yes' or 'no' output class). This will let each network be more specialized for each class, rather than training one single network to learn all 40 classes.
  4. Increase training samples.
like image 32
Martin Avatar answered Sep 25 '22 11:09

Martin