I am working on some scanned text images and I need to highlight all the words in that image.I know the problem is equivalent to finding subimages with extra whitespaces around them.
OCR cannot be used and I just need to outline each word with a border. Can someone suggest how it might be done using OpenCV.
I have tried reading about thresholding and segmenting.I am just looking for someone to point me to some relevant material.
I think your image has a multiline text. In that case, first you have to do is to detect these lines.
For that, first binarize the image using Otsu's method or adaptive thresholding.
Then,you can use something what is called as "Horizontal histogram". It is like a histogram itself, but shows where there are lines and where there are blank spaces. So devide the images at blank lines, and you get each line. Below is the image of a horizontal histogram.
Now for each line, find horizontal histogram. Before that, try to do some dilatation and erosion, so that all letters are grouped together. Then you can find connected components on each line to get each word. Then draw boundaries.
Below image shows both horizontal and vertical histograms:
This SOF might help : How to convert an image into character segments?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With