Is there any way to improve tesseract OCR with small fonts?

Question

I'm trying to use tesseract-OCR via python-tesseract to read a low resolution font that looks like this:

enter image description here

Unfortunately that image returns

ZIJZHZI

I think the resolution is too low and that is causing problems. I've tried magnifying the image, and cropping it down to individual characters, but neither of these provide much improvement. Is there anything else I should consider doing, preferably something that could be done using the Python Imaging Library? Or should I just give up/train tesseract.

For what it's worth, the PIL has the following built in filters:

BLUR, CONTOUR, DETAIL, EDGE_ENHANCE,
EDGE_ENHANCE_MORE, EMBOSS, FIND_EDGES,
SMOOTH, SMOOTH_MORE, and SHARPEN

Hristo Hristov · Accepted Answer

I've tried to magnify the image with:

  convert -resize 400% in.bmp out.bmp

And then read it:

  tesseract out.bmp res

The result is correct:

Is there any way to improve tesseract OCR with small fonts?

Tags:

python-imaging-library

ocr

tesseract

Riazm

1 Answers

Hristo Hristov

Recent Activity

Donate For Us

Is there any way to improve tesseract OCR with small fonts?

Tags:

python-imaging-library

ocr

tesseract

Riazm

1 Answers

Hristo Hristov

Related questions

Recent Activity

Donate For Us