Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Locating Text within image

I am currently working on a project and my goal is to locate text in an image. OCR'ing the text is not my intention as of yet. I want to basically obtain the bounds of text within an image. I am using the AForge.Net imaging component for manipulation. Any assistance in some sense or another?

Update 2/5/09: I've since went along another route in my project. However I did attempt to obtain text using MODI (Microsoft Office Document Imaging). It allows you to OCR an image and pull text from it with some ease.

like image 941
Pat Avatar asked Aug 05 '08 03:08

Pat


People also ask

Can Google detect text in images?

Optical character recognition (OCR) is a technology that extracts text from images. It scans GIF, JPG, PNG, and TIFF images. If you turn it on, the extracted text is then subject to any content compliance or objectionable content rules you set up for Gmail messages.

Can word detect text in an image?

Word supports Optical Character Recognition (OCR). With this feature, you can extract text from a picture or file printout to a Word document.


2 Answers

This is an active area of research. There are literally oodles of academic papers on the subject. It's going to be difficult to give you assistance especially w/o more deatails. Are you looking for specific types of text? Fonts? English-only? Are you familiar with the academic literature?

"Text detection" is a standard problem in any OCR (optical character recognition) system and consequently there are lots of bits of code on the interwebs that deal with it.

I could start listing piles of links from google but I suggest you just do a search for "text detection" and start reading :). There is ample example code available as well.

like image 165
Louis Brandy Avatar answered Sep 22 '22 10:09

Louis Brandy


recognizing text inside an image is indeed a hot topic for researchers in that field, but only begun to grow out of control when captcha's became the "norm" in terms of defense against spam bots. Why use captcha's as protection? well because it is/was very hard to locate (and read) text inside an image!

The reason why I mention captcha's is because the most advancement* is made within that tiny area, and I think that your solution could be best found there. especially because captcha's are indeed about locating text (or something that resembles text) inside a cluttered image and afterwards trying to read the letters correctly.

so if you can find yourself a good open source captcha breaking tool you probably have all you need to continue your quest...
You could probably even throw away the most dificult code that handles the character recognition itself, because those OCR's are used to read distorted text, something you don't have to do.

*: advancement in terms of visible, usable, and practical information for a "non-researcher"

like image 36
sven Avatar answered Sep 21 '22 10:09

sven