I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why? Here is "2" number Images sample - <img src="https://i.stack.imgur.com/IjUtr.png" alt='"2" Number Images'> I resized every image 28 x 28. Here is more detail :- Training data size = 28 x 28 images. Network parameters - As LeNet5 Architecture of Network - <pre class="prettyprint"><code>Input Layer -28x28 | Convolutional Layer - (Relu Activation); | Pooling Layer - (Tanh Activation) | Convolutional Layer - (Relu Activation) | Local Layer(120 neurons) - (Relu) | Fully Connected (Softmax Activation, 10 outputs) </code></pre> This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.

I see two likely problems: Preprocessing: MNIST is not only 28px x 28px, but also: <blockquote> The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. </blockquote> Source: MNIST website Overfitting: <ul> <li>MNIST has 60,000 training examples and 10,000 test examples. How many do you have?</li> <li>Did you try dropout (see paper)?</li> <li>Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)</li> <li>Did you try smaller networks? (And how big are your filters / how many filters do you have?)</li> </ul> Remarks Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?

Digit Recognition on CNN

Tags:

machine-learning

deep-learning

ocr

image-recognition

handwriting-recognition

I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?

Here is "2" number Images sample -

"2" Number Images

I resized every image 28 x 28.

Here is more detail :-

Training data size = 28 x 28 images. Network parameters - As LeNet5 Architecture of Network -

Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)

This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.

532

asked Jul 15 '16 07:07

kumar030

1 Answers

I see two likely problems:

Preprocessing: MNIST is not only 28px x 28px, but also:

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

Source: MNIST website

Overfitting:

MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)

Remarks

Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?

119

answered Sep 23 '22 05:09

Martin Thoma

Related questions
                            
                                Tensorflow seq2seq multidimensional regression
                            
                                Updating an old system to Q-learning with Neural Networks
                            
                                Paragraph Segmentation using Machine Learning
                            
                                Keras + Tensorflow : Debug NaNs
                            
                                Shape Detection using Machine Learning
                            
                                "ValueError: Trying to share variable $var, but specified dtype float32 and found dtype float64_ref" when trying to use get_variable
                            
                                How to return transformed data from an ML.Net pipeline before a predictor is applied
                            
                                Current node to next node feature combinations in decision tree learning: useful to determine potential interactions?
                            
                                Unable to train my keras model : (Data cardinality is ambiguous:)
                            
                                error when using Mirrored strategy in Tensorflow
                            
                                Keras custom loss function to ignore false negatives of a specific class during semantic segmentation?
                            
                                Problems with real-valued input deep belief networks (of RBMs)
                            
                                How can I efficiently use an R prediction model from Java?
                            
                                Implement Gaussian Naive Bayes
                            
                                Named entities as a feature in text categorization?
                            
                                enet() works but not when run via caret::train()
                            
                                How can I speed up a topic model in R?
                            
                                How can I get the relative importance of features of a logistic regression for a particular prediction?
                            
                                Layer names for pretrained inception v3 model (tensorflow) [duplicate]
                            
                                Embedding lookup table doesn't mask padding value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With