Does Tessaract OCR uses neural networks as their default training mechanism

Tags:

Sorry this must be probably a dumb question. but i am fairly new to machine learning and Tessaract OCR. I have heard that Tessaract OCR can be trained.

What i need to know is does Tessaract OCR uses neural networks as their default training mechanism or do we have to program it explicitly to use neural networks ?.

Sorry if i'm thinking in a wrong way about this "training" concept. but what i need to know exactly is is Tessaract already using NN or if not how i can approach using NN with tessaract OCR to improve recognition accuracy ?.

If one can please suggest me some good resources/way to refer/try and to get started it would be a great help too.

what i currently know about basic machine learning supervised training concept and to perform basic image OCR operation in Tessaract OCR.

333

asked Apr 10 '15 12:04

HarshaXsoad

2 Answers

It appears that Tessaract uses an Adaptive Classifier by default. Check this out for a good read:

https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf

There appears to be an option called "Cube mode" where it will switch to using NNs for the learning system instead of the adaptive classifier (https://code.google.com/p/tesseract-ocr-extradocs/wiki/Cube). More info about adaptive classifiers:

http://www.cs.indiana.edu/~rawlins/website/adaptivity/information-helper.html

Also, related very closely is a Learning Classifier System:

http://en.wikipedia.org/wiki/Learning_classifier_system

Also, your terminology of "training" is very close. Training is how you teach the pattern recognition system or learning system what responses it should give to certain input sets. Then, it uses similarities when it encounters unknown data to classify the new data. Machine learning is one of the coolest fields in existence in my opinion (probably biased opinion but whatever!) keep up the learning! You are the meta learner: learning how to teach a machine to learn! Cool stuff!

answered Oct 20 '22 20:10

NKamrath

Yes, starting from tesseract 4.0, it provides a new lstm-based ocr engine: https://tesseract-ocr.github.io/tessdoc/NeuralNetsInTesseract4.00

answered Oct 20 '22 19:10

b.g.

Related questions
                            
                                Does a win32 application have one message loop? Or is it one message loop per window?
                            
                                std::array incomplete type error with an array of std::tuple
                            
                                How do I `std::bind` a non-static class member to a Win32 callback function `WNDPROC`?
                            
                                How to allow range-for loop on my class? [duplicate]
                            
                                Is there any way to control the padding between struct members (incl. bit field) in C++?
                            
                                Issue with CMake project building
                            
                                Unable to link to SDL2 functions using MinGW
                            
                                Getting rid of atlTraceGeneral category shown in ATLTRACE output
                            
                                Detect accidental elided dimension in C++
                            
                                How to throw std::invalid_argument error?
                            
                                How to align RGB and Depth image of Kinect in OpenCV?
                            
                                How do i print a queue?
                            
                                boost::geometry: nearest neighbors using a circle
                            
                                Why can reference members be modified by const member functions?
                            
                                Q_INVOKABLE method returning custom C++ type
                            
                                Communication between C++ and Python
                            
                                C++: find in set of pointers
                            
                                C++ memory management paradigms
                            
                                C++11 vs C++98 conversion operator behavior changes?
                            
                                Constant definition in multiple files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does Tessaract OCR uses neural networks as their default training mechanism

Tags:

c++

machine-learning

neural-network

tesseract

HarshaXsoad

People also ask

2 Answers

NKamrath

b.g.

Recent Activity

Donate For Us