Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get coordinates of recognized characters

Tags:

tesseract

I have a very simple OCR app based on Tesseract. After the recognition step, I also provide a user verification step that allows correction in case OCR is wrong. To improve the user interface, I plan to draw a rectangle on top of the OCR-ed character on the original input image, and put it side by side with the OCR output. To get to that, I need the coordinate of the recognized characters.

I tried something like this but it seems to give me gibberish:

   ETEXT_DESC output;
   tess->Recognize(&output);
   text = tess->GetUTF8Text();

Now if I access output->count, it gives me some value above 10,000, which is obviously wrong because the whole image only has 20 or so characters.

Am I on the right track? Can I have some direction please?

like image 976
Haoest Avatar asked Sep 06 '11 03:09

Haoest


1 Answers

Maybe it's helpful to get the coordinates of the boxes. Try the executable of tesseract. Use the command

"tesseract.exe [image] [output] makebox"

Afterall you get the coordinates of each character, one per row. Then you are able to compare.

like image 100
der_chirurg Avatar answered Sep 25 '22 21:09

der_chirurg