Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android OCR tesseract: using data from Pixa objects to display bounding boxes

I am currently playing around with OCR on android. Therefore I wrote a small app with a Camera preview and now I am feeding tessearact tools (tess-two) images from my onPreviewFrame method. Now I want to display the bounding rectacles from the OCR on my Camera Preview. The TessBaseAPI provides methods that return character/word bouding boxes. The type of the returned object is Pixa, as in the leptonica library provided with tess-two.

So my Question is: How do i get usable coordinates which I can use to draw the bounding boxes on my camera preview from the Pixa objects returned by getCharacters() or getWords() from the TessBaseAPI?

GetCharacters() and getWords() in the BaseAPI

leptonicas Pixa class

Important:

Because the previews only supported image-format is YUV N21 and as far from what I have read the tess-API requires ARGB_8888 Bitmaps I have the following workaround in my onPreviewFrame method right before i feed the bitmap to the TessAPI: (I am also rotating by 90 degrees clockwise because I am using the camera in portrait orientation, but the cameras preview frames come in landscape)

//byte[] bmpdata <- the image in a byte array (NV21 Imageformat) in onPreviewFrame 
YuvImage yuvimage = new YuvImage(bmpdata,ImageFormat.NV21,width,height,null); 

ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Rect rect = new Rect(0, 0, width, height);
yuvimage.compressToJpeg(rect, 100, outStream);

Bitmap bmp = BitmapFactory.decodeByteArray(outStream.toByteArray(),0,outStream.size());

Matrix mtx = new Matrix();
mtx.preRotate(90);
bmp = Bitmap.createBitmap(bmp, 0, 0, bmp.getWidth(), bmp.getHeight(), mtx, false);
bmp = bmp.copy(Bitmap.Config.ARGB_8888, true);

TessTBaseApi.setImage(bmp);

So basically, I compress the NV21 byte[] I got from the camera into a YuvImage, then into a jpeg, and from there into a bitmap. I searched the web alot for the solution on how to get bitmap/jpeg from the NV21 array and this was the easiest i found. This bitmap will be fed to tesseract tools OCR. This brings me to my second question:

How, after these compressions and 90 degree rotation, do I locate where I have to draw the boxes on screen? (relative to before the compressions and the rotation)

This might not be the best or even a good way to supply the OCR with live frames, I appreciate very much comments, other solutions or suggestions of ways of optimzation.

I started this Project two days ago and am a very beginner in programming for android and ocr. During these two days this page helped me alot and answered questions I had so far very well, so thanks for that and thank you in advance for helping me with my current problem. If you would like to see more code or have questions, I will supply and be glad to answer anything I can.

Greetings

You can browse trough the whole API sourcecode on github trough the Pixa class and GetCharacters() links, cant insert more hyperlinks.

like image 896
Jones Avatar asked Jul 14 '12 12:07

Jones


1 Answers

TessTBaseApi.getWords().getBoxRects() will return an ArrayList of bounding box Rects with coordinates relative to your bmp bitmap.

like image 138
rmtheis Avatar answered Oct 29 '22 05:10

rmtheis