Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is OCR no longer an issue?

According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation.

My question is: is this true? Is the current state-of-the-art so good that - for a good scan of English text - there aren't any major improvements left to be made?

Or, a less subjective form of this question is: how accurate are modern OCR systems at recognising English text for good quality scans?

like image 607
David Johnstone Avatar asked Mar 01 '23 04:03

David Johnstone


1 Answers

I think that it is indeed a solved problem. Just have a look on the plethora of OCR technology articles for C#, C++, Java, etc.

Of course the article does stress that the script needs to be typewritten and clear. This makes recognition a relatively trivial task, whereas if you need to OCR scanned pages (noise) or handwriting (diffusion), it can get trickier as there are more things to tune correctly.

like image 60
NT_ Avatar answered Mar 05 '23 17:03

NT_