I'm doing simple recognition of letters and digits with neural networks. Up to now I used every pixel of letter's image as the input to the network. Needless to say this approach produces networks which are very large. So I'd like to extract features from my images and use them as inputs to NNs. My first question is what properties of the letters are good for recognizing them. Second question is how represent these features as inputs to neural networks. For example, I may have detected all corners in the letters and have them as a vector of (x,y) points. How to transform this vector into something suitable for an NN (as the vector sizes may be different for different letters).
This article, Introduction to Artificial Intelligence. OCR using Artificial Neural Networks by Kluever (2008) gives a survey of 4 features extraction techniques for OCR using neural networks. He describes the following methods:
width * height
of the image matrix to width + height
. You use the width
vector and height
vector as separate input.Lots of people have taken varieties of features for OCR. Simplest of which is of course, passing the pixel values directly.
There is a letter recognition data in OpenCV samples, extracted from UCI data set. It employs about 16 various features. Check this SOF : How to create data fom image like "Letter Image Recognition Dataset" from UCI
You can also see the paper explaining this in one of its answer. You can get it by googling.
Also you might be interested in this PPT. It gives a concise explanation of different feature extraction techniques used nowadays.
If you have a very high dimensional input vector, then I suggest you apply principal component analysis (PCA) to remove redundant features and reduce the the dimensionality of the feature vector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With