Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR for known font

Tags:

fonts

ocr

im searching for an OCR lib, that can be parameterized with a font, because I always know it and I believe the recognition results will be lots better this way.

Does anyone know ?

like image 857
Paul Avatar asked Sep 02 '10 16:09

Paul


People also ask

Is OCR A font free?

The Next Generation of OCR Fonts has arrived. Get AnyOCR, the free OCR font designed to be perfectly readable by both humans & machines alike.

What is OCR a format?

Optical Character Recognition (OCR) is the process that converts an image of text into a machine-readable text format. For example, if you scan a form or a receipt, your computer saves the scan as an image file. You cannot use a text editor to edit, search, or count the words in the image file.

How do I train OCR in Matlab?

Alternatively, on the MATLAB Home tab, in the Environment section, click Add-Ons > Get Add-Ons. Then use the search box to find “Computer Vision System Toolbox OCR Language Data.” Add images at any time during the training session. The trainer automatically segments the images for OCR training.


1 Answers

Most OCR engines will handle this situation quite well. In fact OCR engines don't get as confused if there is only one font to recognise on a page. Strange but true in my experience.

If an OCR engine can read your font in the first place then I would just use it and not worry about it. There are better options to pick to improve recognition.

Many OCR engines allow you to set some recognition parameters to help improve recognition such as fixed width or proportional, serif or non-serif, machine or hand print. You can also select a subset of characters such as uppercase or numeric only to improve results considerably. I.e. if you only have numeric characters then the 0 (Zero) character can never get confused with an 'O' or 'o' or 'Ø'. You will find these hints will be more effective than the option of being able to choose the exact fonttype to OCR.

Other engines will allow you to train your OCR engine to deal with new fonts and this will help considerably if you have a strange font.

If your image quality is good and your fonts are clean and of a decent size then I would recommend using Tesseract OCR from Google and OCROpus as suggested by Michael Mior. It is free and works well on clean and clear text. If the text is a little difficult then there are definitely better OCR engines out there such as ABBYY, Prime Recognition, Omnipage and many others although they will cost money.

like image 139
Andrew Cash Avatar answered Sep 28 '22 08:09

Andrew Cash