I've a image like this (white background and black text). If there is not noise (as you can see: the top and bottom of number line has many noise), Tesseract can recognize number very good.
But when has noise, Tesseract try to recognize it as number and add more number to result. It is really bad. How can I make Tesseract Ignore Noise? I can't make a preprocessing image to make it more contrast or sharp text. This doesn't help anything.
If some tool can to hightlight only string line. It can be really good input to Tesseract. Please help me. Thanks everybody.
You should try eroding and dilating:
The most basic morphological operations are two: Erosion and Dilation. They have a wide array of uses, i.e. :
Removing noise
...
you could try to down sample your binary image and sample it up again (pyrDown
and PyrUp
) or you could try to smooth
your image with an gaussian blur. And, as already suggested, erode
and dilate
your image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With