Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract characters from image


Im trying to extract (not recognize!) characters from a black & white image,
so if the image is 123, i get an array of 3 images,

its a duplicate question, i know, but i couldnt find what i want, i also tried looking through codeproject but couldnt find a working example

http://www.codeproject.com/Articles/143059/Neural-Network-for-Recognition-of-Handwritten-Digi
source code not complete


your help is much appreciated :)

like image 916
Salma Nafady Avatar asked Feb 18 '12 15:02

Salma Nafady


2 Answers

As Kenny has already mentioned, "connected component labeling" describes a family of algorithms that identify connected pixels. Connected components also go by the name of "connected regions" or "blobs", and also by the related concept of "contours." Any such algorithm should be able to find not only a shape of connected foreground pixels, but also the presence of "holes" inside the shape consisting of pixels of the background color.

http://en.wikipedia.org/wiki/Connected-component_labeling

This algorithm is used for several engineering fields that rely on image processing, including computer vision, machine vision, and medical imaging. If you're going to spend any amount of time in image processing, you should become very comfortable with this algorithm and implement at least once on your own.

The OpenCV library has a findContours() function that can be used to find contours, contours within contours, etc.
http://opencv.willowgarage.com/wiki/

If you'd like to see a region-labeling algorithm at work, look for references to "cell counting" using the application ImageJ. Counting biological cells is an important and oft-cited application of region labeling for medical imaging.

http://rsbweb.nih.gov/ij/

Consider getting a textbook on the subject rather than learning piecemeal online. Studying connected components (a.k.a. blobs) inevitably leads to consideration of binarization (a.k.a. thresholding), which takes a grayscale or color image and generates a black and white image from it. If you're working with images from a camera, then lighting becomes critical, and that takes time and tinkering to learn.

There are a host of other preprocessing steps that may be necessary to clean up the image. The need for preprocessing depends on your application.

Here's a textbook that is often recommended, and that gives good coverage of standard image processing techniques:

Digital Image Processing by Gonzalez and Woods, 3rd edition http://www.imageprocessingplace.com/

Go to addall.com to find cheap copies. International editions are cheaper.

If the characters (or other shapes) in the image are of a consistent size and shape--for example, an "A" is always 40 pixels tall and 25 pixels and machine printed in the same font--then you might use a "normalized cross-correlation" or template-matching technique to identify the presence of one or more matching shapes. This technique can work as a very crude sort of OCR, but has severe limitations.

http://en.wikipedia.org/wiki/Template_matching

like image 106
Rethunk Avatar answered Sep 23 '22 06:09

Rethunk


If your image represents black characters on a white background (or vice versa) and if the image is of reasonable quality and if the lines of text are horizontal and if each character is separated from its neighbours it is a relatively trivial operation to find all the little islands of black pixels in the sea of white.

As each of these ifs is relaxed the problem becomes harder but remains the same conceptually: find a black pixel then find all the other black pixels to which it is connected and you have found a character. Or, bearing in mind the comments about OCR and your requirement, you have found a patch of black pixels which (you assert) represent a character.

like image 35
High Performance Mark Avatar answered Sep 21 '22 06:09

High Performance Mark