Is OCR no longer an issue?

Question

According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation.

My question is: is this true? Is the current state-of-the-art so good that - for a good scan of English text - there aren't any major improvements left to be made?

Or, a less subjective form of this question is: how accurate are modern OCR systems at recognising English text for good quality scans?

NT_ · Accepted Answer

I think that it is indeed a solved problem. Just have a look on the plethora of OCR technology articles for C#, C++, Java, etc.

Of course the article does stress that the script needs to be typewritten and clear. This makes recognition a relatively trivial task, whereas if you need to OCR scanned pages (noise) or handwriting (diffusion), it can get trickier as there are more things to tune correctly.

Is OCR no longer an issue?

Tags:

text-extraction

ocr

layout-extraction

David Johnstone

1 Answers

NT_

Recent Activity

Donate For Us

Is OCR no longer an issue?

Tags:

text-extraction

ocr

layout-extraction

David Johnstone

1 Answers

NT_

Related questions

Recent Activity

Donate For Us