I'm creating an OCR based on Java. My objective is to extract text from a video file (post-processing).
It has been a difficult search, trying to find free, open-source OCR that works purely on Java. I found Tess4J to be the only popular option, but given the need for the native interface, I somehow felt inclined towards developing the algorithm from scratch.
I need to create a dependable OCR that properly identifies English alphabets (computerized typefaces only, not handwritten text) with reasonable accuracy, given that the region in which the text lies in the video frame is pre-defined. We can also assume that the color of the text is given.
What I've done so far:
(All image processing done using Java bindings for openCV)
I've extracted features for training my classifier using:
A. Pixel intensities, after down-sampling the character image to 12 X 12 resolution. (144 feature vectors)
B. Gabor wavelet transform across 8 different angles (0, 11.25, 22.5 ...etc) and the computed energy using mean squared value of the signal for all these angles. (8 feature vectors)
A+B gives me the feature vector of the images. (Total 152 feature vectors)
I've 62 classes for classification, viz. 0,1,2...9 | a,b,c,d...y,z | A,B,C,D...Y,Z
I train the classifier using 20 x 62 samples (20 for each class).
For classification, I've used the following two approach:
A. ANN with 1 hidden layer (of 120 nodes). Input layer has 152 nodes and output has 62. Hidden and output layer have sigmoid activation function and the network is trained using Resilient Back Propagation.
B. kNN classification for the entire 152 dimensions.
Where I stand:
k-Nearest Neighbor search is turning out to be a better classifier than neural network (so far). However, even with kNN, I'm finding it difficult to classify letters like: OR .
Moreover, it is classifying as Z... to name few of the anomalies.
What I'm looking for:
I want to find out the following:
Why is ANN under-performing? What configuration of network should I use to push the performance higher? Can we fine-tune ANN to perform better than kNN search?
What other feature vectors can I use for making the OCR more robust?
Any other suggestions for performance optimization are welcome.
Increase the contrast and density before carrying out the OCR process. This can be done in the scanning software itself or in any other image processing software. Increasing the contrast between the text/image and its background brings out more clarity in the output.
Popular Answers (1) The tesseract algorithm is available on Google Code, and is one of the best open source OCR out there.
kNN algorithm doesn't need a lot of tuning, unlike the neural networks, so you can obtain good performance easily, but a multi-layer perceptron may outperform kNN. Currently, I think the best result are reached using deep-learning, you should take a look at convolutional neural network for example.
From wikipedia:
A CNN is composed of one or more convolutional layers with fully connected layers (matching those in typical artificial neural networks) on top. It also uses tied weights and pooling layers. This architecture allows CNNs to take advantage of the 2D structure of input data. In comparison with other deep architectures, convolutional neural networks are starting to show superior results in both image and speech applications. They can also be trained with standard backpropagation. CNNs are easier to train than other regular, deep, feed-forward neural networks and have many fewer parameters to estimate, making them a highly attractive architecture to use.
Speaking about your MLP, there is a lot of algorithms to search for better parameters, for example grid search or swarm optimization. I like to use a genetic algorithm to tune the parameters of a NN, it's quite simple and yields good performance.
I recommend you JGap, an nice genetic algorithm framework in java, which can be used out-of-the-box :)
Here is the JGAP's presentation of genetic algorithm, which would be better than any of my presentation:
Genetic algorithms (GA's) are search algorithms that work via the process of natural selection. They begin with a sample set of potential solutions which then evolves toward a set of more optimal solutions. Within the sample set, solutions that are poor tend to die out while better solutions mate and propagate their advantageous traits, thus introducing more solutions into the set that boast greater potential (the total set size remains constant; for each new solution added, an old one is removed). A little random mutation helps guarantee that a set won't stagnate and simply fill up with numerous copies of the same solution.
In general, genetic algorithms tend to work better than traditional optimization algorithms because they're less likely to be led astray by local optima. This is because they don't make use of single-point transition rules to move from one single instance in the solution space to another. Instead, GA's take advantage of an entire set of solutions spread throughout the solution space, all of which are experimenting upon many potential optima.
However, in order for genetic algorithms to work effectively, a few criteria must be met:
It must be relatively easy to evaluate how "good" a potential solution is relative to other potential solutions.
It must be possible to break a potential solution into discrete parts that can vary independently. These parts become the "genes" in the genetic algorithm.
Finally, genetic algorithms are best suited for situations where a "good" answer will suffice, even if it's not the absolute best answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With