Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which OCR Engine is better: Tesseract or OCRopus? [closed]

I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: which is better, Tesseract or OCRopus, in terms of digit extraction and if my image preprocessing is low?
Has anyone run tests using both engines comparing the results using the usual metrics?

like image 797
Ahmed Hussein Avatar asked Apr 05 '12 17:04

Ahmed Hussein


People also ask

What is better than Tesseract OCR?

Google does well on the scanned email and recognizes the text in the smartphone-captured document similarly well as ABBYY. However it is much better than Tesseract or ABBYY in recognizing handwriting, as the second result image shows: still far from perfect, but at least it got some things right.

Which OCR engine output is more efficient and faster?

Overall Results of OCR Text Accuracy with 90% confidence intervals Google Cloud Platform's Vision OCR tool has the greatest text accuracy by 98.0% when the whole data set is tested.

Is Tesseract OCR any good?

While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.

Why is the Tesseract OCR not accurate?

Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text.


2 Answers

Initially OCRopus was actually using Tesseract as recognition engine inside, but later they changed it to their own brand-new engine. It is still fresh and not mature. We have been making accuracy comparison about year ago, and OCRopus was definitely losing to Tesseract, I am not even talking about commercial enignes. Since then I stopped following OCRopus progress, but what I definetely know that activity on OCRopus support forum is close to zero now. That means, no one is using it. Mostly people are using commercial engines, but if price is an issue for them and they can tolerate lower accuracy, then they use Tesseract. It is definetely best one among Open Source.

like image 97
Tomato Avatar answered Sep 22 '22 15:09

Tomato


You can also check the activity of projects in "changes" link

https://code.google.com/p/ocropus/source/list?repo=ocropy

https://code.google.com/p/tesseract-ocr/source/list

tesseract is much busier

like image 24
IvanM Avatar answered Sep 24 '22 15:09

IvanM