Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

word segmentation using opencv [closed]

I am working on some scanned text images and I need to highlight all the words in that image.I know the problem is equivalent to finding subimages with extra whitespaces around them.

OCR cannot be used and I just need to outline each word with a border. Can someone suggest how it might be done using OpenCV.

I have tried reading about thresholding and segmenting.I am just looking for someone to point me to some relevant material.

like image 932
code4fun Avatar asked Oct 06 '12 23:10

code4fun


1 Answers

I think your image has a multiline text. In that case, first you have to do is to detect these lines.

For that, first binarize the image using Otsu's method or adaptive thresholding.

Then,you can use something what is called as "Horizontal histogram". It is like a histogram itself, but shows where there are lines and where there are blank spaces. So devide the images at blank lines, and you get each line. Below is the image of a horizontal histogram.

Horizontal histogram

Now for each line, find horizontal histogram. Before that, try to do some dilatation and erosion, so that all letters are grouped together. Then you can find connected components on each line to get each word. Then draw boundaries.

Below image shows both horizontal and vertical histograms:

horizontal and vertical histograms

This SOF might help : How to convert an image into character segments?

like image 154
Abid Rahman K Avatar answered Oct 15 '22 06:10

Abid Rahman K