Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR (tesseract), intelligent rotation for Image

I'm developing an Android app which uses tesseract OCR to recognize Text, now I have the Problem that on different Smartphones the image gets rotate in a different way, so on one it is in landscape mode right away and on the other in portrait mode. So now i want to intelligently rotate the Image so that Tesseract can recognize the Text. Which is only in one of the two options possible, but it might be in either, due to the user taking the picture. I don't want the User to have to take the picture in the same format everytime, i want to rotate it so it fits the need, if possible without too much of a performance loss.

The Tesseract lib with the autorotate does not seem to work for me in that way. Anybody an idea how to solve that problem.

Thanks

like image 887
Lenny Avatar asked Aug 28 '13 12:08

Lenny


People also ask

Can Tesseract read rotated text?

In OSD mode, Tesseract can detect text orientation and script type. From there, we can rotate the text back to 0° with OpenCV.

How do I use Tesseract to read text from an image?

Create a Python tesseract script Create a project folder and add a new main.py file inside that folder. Once the application gives access to PDF files, its content will be extracted in the form of images. These images will then be processed to extract the text.


2 Answers

If this question is still relevant for you: Maybe you can extract the exif data of the image, to get its orientation?

Otherwise this paper maybe can help you: Combined Orientation and Script Detection using the Tesseract OCR Engine.

like image 162
Alexander Taubenkorb Avatar answered Nov 07 '22 11:11

Alexander Taubenkorb


If you don't mind rolling your sleeves up, http://www.leptonica.org/ is probably a good option to evaluate the glyphs (raw Pix that is not detected as text yet) and determine orientation. I've seen references to Android bindings for Leptonica.

like image 37
dhartford Avatar answered Nov 07 '22 09:11

dhartford