I am working on an application Which need to convert the jpeg image to text so that I can identify the text written there in the image. plz give me a guidence to do that .
EXTRACT FROM Making OCR app using Tesseract.
Note: These instructions are for Android SDK r19 and Android NDK r7c. On 64-bit Ubuntu, you may need to install the ia32-libs 32-bit compatibility library. You would also need proper PATH variables added.
Download the source or clone this git repository. This project contains tools for compiling the Tesseract, Leptonica, and JPEG libraries for use on Android. It contains an Eclipse Android library project that provides a Java API for accessing natively-compiled Tesseract and Leptonica APIs. You don’t need eyes-two code, you can do without it.
Build this project using these commands (here, tess-two is the directory inside tess-two – the one at the same level as of tess-two-test):
cd <project-directory>/tess-two
ndk-build
android update project --path .
ant release
Now import the project as a library in Eclipse.
File -> Import -> Existing Projects into workspace -> tess-two directory<code>. Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library
Configure your project to use the tess-two project as a library project:
Right click your project name -> Properties -> Android -> Library -> Add, and choose tess-two.
You’re now ready to OCR any image using the library.
First, we need to get the picture itself. For that, I found a simple code to capture the image here. After we have the bitmap, we just need to perform the OCR which is relatively easy. Be sure to correct the rotation and image type by doing something like:
// _path = path to the image to be OCRed
ExifInterface exif = new ExifInterface(_path);
int exifOrientation = exif.getAttributeInt(
ExifInterface.TAG_ORIENTATION,
ExifInterface.ORIENTATION_NORMAL);
int rotate = 0;
switch (exifOrientation) {
case ExifInterface.ORIENTATION_ROTATE_90:
rotate = 90;
break;
case ExifInterface.ORIENTATION_ROTATE_180:
rotate = 180;
break;
case ExifInterface.ORIENTATION_ROTATE_270:
rotate = 270;
break;
}
if (rotate != 0) {
int w = bitmap.getWidth();
int h = bitmap.getHeight();
// Setting pre rotate
Matrix mtx = new Matrix();
mtx.preRotate(rotate);
// Rotating Bitmap & convert to ARGB_8888, required by tess
bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
}
bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);
Now we have the image in the bitmap, and we can simply use the TessBaseAPI to run the OCR like:
TessBaseAPI baseApi = new TessBaseAPI();
// DATA_PATH = Path to the storage
// lang = for which the language data exists, usually "eng"
baseApi.init(DATA_PATH, lang);
// Eg. baseApi.init("/mnt/sdcard/tesseract/tessdata/eng.traineddata", "eng");
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
(You can download the language files from [here][2] and put them in a directory on your device – manually or by code)
Now that you’ve got the OCRed text in the variable recognizedText, you can do pretty much anything with it – translate, search, anything! ps. You can add various language support by having a preference and then downloading the required language data file from here. You might even put them in the assets folder and copy them to the SD card on start.
Troubleshooting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With