I'm working with Tesseract on Android, and I have the following code to extract the string and the boxes read from an image:
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(tess_path, "eng");
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
ArrayList<Rect> boxes = baseApi.getCharacters().getBoxRects();
Pixa pixa = baseApi.getCharacters();
baseApi.end();
Here I can see the text and the boxes of each character, but sometimes the text has a different size than the boxes array, then it is impossible to set the box with the character read.
Is there any way to obtain the exact box and its char?
Use a ResultIterator instead of getCharacters():
// Iterate through the results.
final ResultIterator iterator = baseApi.getResultIterator();
String lastUTF8Text;
float lastConfidence;
int count = 0;
iterator.begin();
do {
lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_WORD);
lastConfidence = iterator.confidence(PageIteratorLevel.RIL_WORD);
count++;
} while (iterator.next(PageIteratorLevel.RIL_WORD));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With