Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using tesseract to recognize license plates

I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well:

extracting license plates

extracting license plates

But as you can see, the OCR results are pretty bad.

I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine:

// init the tesseract engine.     tesseract = new tesseract::TessBaseAPI();     int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], [language UTF8String]);     tesseract->SetVariable("tessedit_char_whitelist", "BCDFGHJKLMNPQRSTVWXYZ0123456789-");     tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");     tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");     tesseract->SetVariable("load_system_dawg", "0"); 

How can I improve the results? Do I need to let OpenCV do more image manipulation? Or is there something I can improve with tesseract?

like image 402
unicorn80 Avatar asked Oct 09 '13 09:10

unicorn80


People also ask

Is there an app that can read license plates?

The prototype app, called DiDi Plate, uses an Android phone's camera to scan the plate and send it to a cloud ID service. The driver who scanned the plate can then start texting the other driver. "Even if the other driver didn't register this app, you can still give them greetings and comments," Du said.

Can OCR read number plates?

If the car exceeds the speed limit, you can analyze the license plate, apply OCR to it, and log the license plate number to a database. Such a system could help reduce speeding violations and create better neighborhood safety.


1 Answers

Two things will fix this completely:

  1. Remove everything which is not text from the image. You need to use some CV to find the plate area (for example by color, etc) and then mask out all of the background. You want the input to tesseract to be black and white, where text is black and everything else is white

  2. Remove skew (as mentioned by FrankPI above). tesseract is actually supposed to work okay with skew (see "Tesseract OCR Engine" overview by R. Smith) but on the other hand it doesn't always work, especially if you have a single line as opposed to a few paragraphs. So removing skew manually first is always good, if you can do it reliably. You will probably know the exact shape of the bounding trapezoid of the plate from step 1, so this should not be too hard. In the process of removing skew, you can also remove perspective: all license plates (usually) have the same font, and if you scale them to the same (perspective-free) shape the letter shapes would be exactly the same, that would help text recognition.

Some further pointers...

Don't try to code this at first: take a really easy to OCR (ie: from directly in front, no perspective) picture of a plate, edit it in photoshop (or gimp) and run it through tesseract on the commandline. Keep editing in different ways until this works. For example: select by color (or flood select the letter shapes), fill with black, invert selection, fill with white, perspective transform so corners of plate are a rectangle, etc. Take a bunch of pictures, some harder (maybe from odd angles, etc). Do this with all of them. Once this works completely, think about how to make a CV algorithm that does the same thing you did in photoshop :)

P.S. Also, it is better to start with higher resolution image if possible. It looks like the text in your example is around 14 pixels tall. tesseract works pretty well with 12 point text at 300 dpi, this is about 50 pixels tall, and it works much better at 600 dpi. Try to make your letter size be at least 50 preferably 100 pixels.

P.P.S. Are you doing anything to train tesseract? I think you have to do that, the font here is different enough to be a problem. You probably also need something to recognize (and not penalize) dashes which will be very common in your texts, looks like in the second example "T-" is recognized as H.

like image 191
Alex I Avatar answered Oct 02 '22 14:10

Alex I