Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android Tesseract & Leptonica OCR. Read individual box and char

I'm working with Tesseract on Android, and I have the following code to extract the string and the boxes read from an image:

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(tess_path, "eng"); 
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
ArrayList<Rect> boxes = baseApi.getCharacters().getBoxRects();
Pixa pixa = baseApi.getCharacters();
baseApi.end();

Here I can see the text and the boxes of each character, but sometimes the text has a different size than the boxes array, then it is impossible to set the box with the character read.

Is there any way to obtain the exact box and its char?

like image 550
user2021731 Avatar asked Nov 20 '25 19:11

user2021731


1 Answers

Use a ResultIterator instead of getCharacters():

// Iterate through the results.
final ResultIterator iterator = baseApi.getResultIterator();
String lastUTF8Text;
float lastConfidence;
int count = 0;
iterator.begin();
do {
    lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_WORD);
    lastConfidence = iterator.confidence(PageIteratorLevel.RIL_WORD);
    count++;
} while (iterator.next(PageIteratorLevel.RIL_WORD));
like image 100
rmtheis Avatar answered Nov 23 '25 08:11

rmtheis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!