Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recognize numbers in images

I've been searching for resources for number recognition in images on the web. I found many links providing lots of resources on that topic. But unfortunately it's more confusing than helping, I don't know where to start.

I've got an image with 5 numbers in it, non-disturbed (no captcha or something like this). The numbers are black on a white background, written in a standard font.

My first step was to separate the numbers. The algorithm I currently use is quite simple, it just checks if a column is entirely white and thus a space. Then it trims each character, so that there is no white border around it. This works quite well.

But now I'm stuck with the actual recognition of the number. I don't know what's the best way of guessing the correct one. I don't think directly comparing to the font is a good idea, because if the numbers only differ a little, it will no more work.

Could anyone give me a hint on how this is done?

It doesn't matter to the question, but I'll be implementing this in C# or Java. I found some libraries which would do the job, but I'd like to implement it myself, to learn something.

like image 885
svens Avatar asked Mar 09 '10 19:03

svens


People also ask

How do I read a JPEG number?

jpg? The first letter P stands for picture. The second to fourth number/letter represent the date the image is taken. The second number/letter represent month (1-9 for Jan-Sep, ABC for Oct-Dec).

How can I get text from an image?

You can capture text from a scanned image, upload your image file from your computer, or take a screenshot on your desktop. Then simply right click on the image, and select Grab Text. The text from your scanned PDF can then be copied and pasted into other programs and applications. How can I copy text from an image?


1 Answers

Why not look at using an open source OCR engine such as Tesseract?

http://code.google.com/p/tesseract-ocr/

C# Wrapper for Tesseract

http://www.pixel-technology.com/freeware/tessnet2/

Java Wrapper for Tesseract

http://sourceforge.net/projects/tessocrinjava/

While you might not consider using a third-party library as implementing it yourself, there's a tremendous amount of work that goes into just integrating the third-party tool. Keep in mind also that something that may seem simple (recognizing the number 5 versus the number 6) is often very complex; we're talking thousands and thousands of lines of code complex. In the least, look at the source code for tesseract and it'll give you a good reason to want to leverage a third-party library.

Here's another SO question that'll give you some ideas about the algorithms involved: https://stackoverflow.com/questions/850717/what-are-some-popular-ocr-algorithms

like image 160
Keith Adler Avatar answered Sep 19 '22 15:09

Keith Adler