Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Miminum requirements for Google tensorflow image classifier

We are planning to build image classifiers using Google Tensorflow.

I wonder what are the minimum and what are the optimum requirements to train a custom image classifier using a convolutional deep neural network?

The questions are specifically:

  • how many images per class should be provided at a minimum?
  • do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?
  • what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes.
  • is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000.
like image 899
Jabb Avatar asked Dec 08 '15 17:12

Jabb


People also ask

Can TensorFlow be used for image recognition?

The intended use is (for scientific research in image recognition using artificial neural networks) by using the TensorFlow and Keras library. This solution applies the same techniques as given in https://www.tensorflow.org/tutorials/keras/classification.

How many pictures do I need to train RCNN?

the classes are trained with many images. Usually around 100 images are sufficient to train a class. If the images in a class are very similar, fewer images might be sufficient. the training images are representative of the variation typically found within the class.

Which machine learning algorithm is best for image recognition?

Convolutional Neural Networks (CNNs) is the most popular neural network model being used for image classification problem.


2 Answers

"how many images per class should be provided at a minimum?"

Depends how you train.

If training a new model from scratch, purely supervised: For a rule of thumb on the number of images, you can look at the MNIST and CIFAR tasks. These seem to work OK with about 5,000 images per class. That's if you're training from scratch.

You can probably bootstrap your network by beginning with a model trained on ImageNet. This model will already have good features, so it should be able to learn to classify new categories without as many labeled examples. I don't think this is well-studied enough to tell you a specific number.

If training with unlabeled data, maybe only 100 labeled images per class. There is a lot of recent research work on this topic, though not scaling to as large of tasks as Imagenet. Simple to implement:

http://arxiv.org/abs/1507.00677

Complicated to implement:

http://arxiv.org/abs/1507.02672
http://arxiv.org/abs/1511.06390
http://arxiv.org/abs/1511.06440

"do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?"

It should work with different numbers of examples per class.

"what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes."

You should use the label smoothing technique described in this paper:

http://arxiv.org/abs/1512.00567

Smooth the labels based on your estimate of the label error rate.

"is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000."

Yes

like image 109
Ian Goodfellow Avatar answered Sep 28 '22 08:09

Ian Goodfellow


How many images per class should be provided at a minimum?

do we need to appx. provide the same amount of training images per class or can the amount per class be disparate?

what is the impact of wrong image data in the training data? E.g. 500 images of a tennis shoe and 50 of other shoes.

These three questions are not really TensorFlow specific. But the short answer is, it depends on the resiliency of your model in handling unbalanced data set and noisy labels.

is it possible to train a classifier with much more classes than the recently published inception-v3 model? Let's say: 30.000.

Yes, definitely. This would mean a much larger classifier layer, so your training time might be longer. Other than that, there are no limitations in TensorFlow.

like image 41
keveman Avatar answered Sep 28 '22 06:09

keveman