iOS Tesseract OCR Image Preperation

Tags:

I would like to implement an OCR application that would recognize text from Photos.

I succeeded in Compiling and Integration the Tesseract Engine in iOS, I succeeded in getting reasonable detection when photographing clear documents (or a photoshot of this text from the screen) but for other text such as signposts, shop signs, colour background, the detection failed.

The Question is What kind of image processing preparations are necessary to get better recognition. For example, I expect that we need to transform the images into grayscale /B&W as well as fixing contrast etc.

How can this be done in iOS, Is there a package for this?

491

asked Nov 22 '12 10:11

alandalusi

1 Answers

I'm currently working on the same thing. I found that a PNG saved in photoshop worked fine, but an image which was originally sourced from the camera then imported into the app never worked. Don't ask me to explain it - but applying this function made these images work. Maybe it'll work for you too.

Click to copy

// this does the trick to have tesseract accept the UIImage.
UIImage * gs_convert_image (UIImage * src_img) {
    CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
    /*
     * Note we specify 4 bytes per pixel here even though we ignore the
     * alpha value; you can't specify 3 bytes per-pixel.
     */
    size_t d_bytesPerRow = src_img.size.width * 4;
    unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
    CGContextRef context =  CGBitmapContextCreate(imgData, src_img.size.width,
                                                  src_img.size.height,
                                                  8, d_bytesPerRow,
                                                  d_colorSpace,
                                                  kCGImageAlphaNoneSkipFirst);

    UIGraphicsPushContext(context);
    // These next two lines 'flip' the drawing so it doesn't appear upside-down.
    CGContextTranslateCTM(context, 0.0, src_img.size.height);
    CGContextScaleCTM(context, 1.0, -1.0);
    // Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
    [src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
    UIGraphicsPopContext();

    /*
     * At this point, we have the raw ARGB pixel data in the imgData buffer, so
     * we can perform whatever image processing here.
     */


    // After we've processed the raw data, turn it back into a UIImage instance.
    CGImageRef new_img = CGBitmapContextCreateImage(context);
    UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
                                 new_img];

    CGImageRelease(new_img);
    CGContextRelease(context);
    CGColorSpaceRelease(d_colorSpace);
    free(imgData);
    return convertedImage;
}

I've also gone a lot of experimentation preparing the image for tesseract. Resizing, converting to grayscale, then adjusting brightness and contrast seems to work best.

I've also tried this GPUImage library. https://github.com/BradLarson/GPUImage And the GPUImageAverageLuminanceThresholdFilter seems to give me a great adjusted image, but tesseract doesn't seem to work well with it.

I've also put in opencv into my project and plan to try out it's image routines. Possibly even some box detection to find the text area (i'm hoping this will speed up tesseract).

132

answered Sep 30 '22 03:09

roocell

Related questions
                            
                                How to delete the only cell from UICollectionView with animation (deleteItemsAtIndexPaths)?
                            
                                failed to find PDF header: `%PDF' not found
                            
                                How to decode a H.264 frame on iOS by hardware decoding?
                            
                                Get tapped word from UITextView in Swift
                            
                                Bottom layout guide length issue with tabbar after pushing
                            
                                Unable to run app in simulator
                            
                                Swift countElements() return incorrect value when count flag emoji
                            
                                issue with uploading dSYM to crashlytics
                            
                                Get function name in Swift
                            
                                Multi-line label in swift 2 sprite-kit?
                            
                                Set user agent with WebView with react-native
                            
                                Push Notifications not working in Firebase 4.0
                            
                                How do I update a text label in SwiftUI?
                            
                                iOS / mobile safari still zooms while viewport is set to user-scalable=no ? Check Accessibility settings!
                            
                                iOS: Programmatically add custom font during runtime
                            
                                android/iphone click to call in html
                            
                                How to set scroll position on uiwebview
                            
                                YouTube API not working with iPad / iPhone / non-Flash device
                            
                                Disable receiving touches from parent view on subview
                            
                                iOS - How to draw a YUV image using openGL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

iOS Tesseract OCR Image Preperation

Tags:

ios

image-processing

ocr

tesseract

alandalusi

People also ask

1 Answers

roocell

Recent Activity

Donate For Us