I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: which is better, Tesseract or OCRopus, in terms of digit extraction and if my image preprocessing is low?
Has anyone run tests using both engines comparing the results using the usual metrics?
Google does well on the scanned email and recognizes the text in the smartphone-captured document similarly well as ABBYY. However it is much better than Tesseract or ABBYY in recognizing handwriting, as the second result image shows: still far from perfect, but at least it got some things right.
Overall Results of OCR Text Accuracy with 90% confidence intervals Google Cloud Platform's Vision OCR tool has the greatest text accuracy by 98.0% when the whole data set is tested.
While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.
Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text.
Initially OCRopus was actually using Tesseract as recognition engine inside, but later they changed it to their own brand-new engine. It is still fresh and not mature. We have been making accuracy comparison about year ago, and OCRopus was definitely losing to Tesseract, I am not even talking about commercial enignes. Since then I stopped following OCRopus progress, but what I definetely know that activity on OCRopus support forum is close to zero now. That means, no one is using it. Mostly people are using commercial engines, but if price is an issue for them and they can tolerate lower accuracy, then they use Tesseract. It is definetely best one among Open Source.
You can also check the activity of projects in "changes" link
https://code.google.com/p/ocropus/source/list?repo=ocropy
https://code.google.com/p/tesseract-ocr/source/list
tesseract is much busier
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With