I want to make a program that takes an image as input and outputs text. Now I know that I can use a neural network to turn an image of single character into that character. The difficult part is: given an image with text in it, how would I produce all the rectangles around each individual character? What method could I use to do it?
A basic approach is to make a histogram of black pixels. First: project all pixels on a line. The deep valleys in the histgram indicate separation between lines (try different angles if the paper might be tilted). Then, per line (or per page if you know the font is monospaced) project the pixels on a horizontal histogram. This will give you a strong indication of inter character spaces. As a minimum this gives you a value for the average character height and width that will help you in next steps.
After that, you need to take care of kerning (where characters overlap). Find the connected pixels, possibly by first doing dilatation or erosion on the image to compensate for scanning artifacts.
Depending on the quality of the scan image you may have to use more advanced techniques, but this will get you going.
This doesn't sound like artificial intelligence, it sounds like you're talking about OCR:
http://en.wikipedia.org/wiki/Optical_character_recognition
See google tesseract
http://code.google.com/p/tesseract-ocr/
EDIT The unedited question was asking about artificial intelligence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With