According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation.
My question is: is this true? Is the current state-of-the-art so good that - for a good scan of English text - there aren't any major improvements left to be made?
Or, a less subjective form of this question is: how accurate are modern OCR systems at recognising English text for good quality scans?
I think that it is indeed a solved problem. Just have a look on the plethora of OCR technology articles for C#, C++, Java, etc.
Of course the article does stress that the script needs to be typewritten and clear. This makes recognition a relatively trivial task, whereas if you need to OCR scanned pages (noise) or handwriting (diffusion), it can get trickier as there are more things to tune correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With