Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Customize Tesseract Ignores Noise?

I've a image like this (white background and black text). If there is not noise (as you can see: the top and bottom of number line has many noise), Tesseract can recognize number very good.

But when has noise, Tesseract try to recognize it as number and add more number to result. It is really bad. How can I make Tesseract Ignore Noise? I can't make a preprocessing image to make it more contrast or sharp text. This doesn't help anything.

If some tool can to hightlight only string line. It can be really good input to Tesseract. Please help me. Thanks everybody.

enter image description here

like image 633
Bằng Rikimaru Avatar asked Apr 07 '13 13:04

Bằng Rikimaru


2 Answers

You should try eroding and dilating:

The most basic morphological operations are two: Erosion and Dilation. They have a wide array of uses, i.e. :

Removing noise

...

like image 196
ArtemStorozhuk Avatar answered Oct 03 '22 14:10

ArtemStorozhuk


you could try to down sample your binary image and sample it up again (pyrDown and PyrUp) or you could try to smooth your image with an gaussian blur. And, as already suggested, erode and dilate your image.

like image 21
sschrass Avatar answered Oct 03 '22 14:10

sschrass