Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different Results with tesseract on same image

Hello I am trying to ocr on an image.

enter image description here

this is the original image after some pre processing (skipping preprocessing part since its not really related to my question but will share if somebody needs it)

I've got this image

enter image description here

when I try to ocr this image with using tesseract

I'm getting a result as

HN'

2809

however when I manually crop half part of the image on photoshop

enter image description here

I recieve

HN'

Z8

as a result.

I wonder whats difference between those two images because one gives 2 instead of Z but the other one gives the Z.

I know I have to smooth edges for more accurate results but motion blur, gaussian blur nor ordinary blur filter did change the results I'm getting.

like image 506
Anar Bayramov Avatar asked Oct 19 '22 19:10

Anar Bayramov


1 Answers

Tesseract implements an algorithm that picks number 2 over letter Z based on the amount and type of digits in the neighbourhood:

  • In the first image, it guesses 2 over Z because it's neighbours are all numbers (809), so it assumes that the first digit must also be a number.

I had this problem before. :(

By the way, I think you should flip the first part of the image so HN' becomes .NH.

like image 143
karlphillip Avatar answered Oct 22 '22 23:10

karlphillip