How to get coordinates of recognized characters

Question

I have a very simple OCR app based on Tesseract. After the recognition step, I also provide a user verification step that allows correction in case OCR is wrong. To improve the user interface, I plan to draw a rectangle on top of the OCR-ed character on the original input image, and put it side by side with the OCR output. To get to that, I need the coordinate of the recognized characters.

I tried something like this but it seems to give me gibberish:

   ETEXT_DESC output;
   tess->Recognize(&output);
   text = tess->GetUTF8Text();

Now if I access output->count, it gives me some value above 10,000, which is obviously wrong because the whole image only has 20 or so characters.

Am I on the right track? Can I have some direction please?

der_chirurg · Accepted Answer

Maybe it's helpful to get the coordinates of the boxes. Try the executable of tesseract. Use the command

"tesseract.exe [image] [output] makebox"

Afterall you get the coordinates of each character, one per row. Then you are able to compare.

How to get coordinates of recognized characters

Tags:

tesseract

Haoest

1 Answers

der_chirurg

Recent Activity

Donate For Us

How to get coordinates of recognized characters

Tags:

tesseract

Haoest

1 Answers

der_chirurg

Related questions

Recent Activity

Donate For Us