Tesseract-OCR (3.02) recognition accuracy and speed

Question

I have group of very small images (w:70-100 ; h:12-20), like the one below:

In those images nothing but nickname of group's member. I want to read the text from simple images, they all have one background, only nickames are different. So, what I've done with that image:

I am using code below to get text from second image:

tesseract::TessBaseAPI ocr;
ocr.Init(NULL, "eng");
PIX* pix = pixRead("D:\image.png");
ocr.SetImage(pix);
std::string result = ocr.GetUTF8Text();

I have 2 problems with that:

The ocr.GetUTF8Text(); is working slow: 650-750ms. Image is small, why it works so long anyway?
From the image above I am getting result like: "iwillkillsm", "iwillkillsel" etc. That image is simple, and I believe tesseract gurus are able to recognize it with 100% accuracy.

What should I do with image/code or what should I read (and where) about tesseract-ocr (something about text speed and quality recognition) to solve those problems?

nlloyd · Accepted Answer

It may sound odd, but I've always had the best luck with tesseract when I increased the dimensions of the image. The image would look "worse" to me but tesseract went faster and had much better accuracy.

There is a limit to how big you can make the images before you start getting worse results however :) I think I remember shooting for 600px in the past. You'll have to play with it though.

Tesseract-OCR (3.02) recognition accuracy and speed

Tags:

image

tesseract

Anton Kasabutski

1 Answers

nlloyd

Recent Activity

Donate For Us

Tesseract-OCR (3.02) recognition accuracy and speed

Tags:

image

tesseract

Anton Kasabutski

1 Answers

nlloyd

Related questions

Recent Activity

Donate For Us