Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character segmentation of captcha image

I'm trying to crack a CAPTCHA image but I can't find a way to segment the characters.

I have this image: enter image description here

I applied some filters and thresholding which resulted in this image:enter image description here

Now I need to segment the image to be used in a classifier such as SVM or ANN.

The problem is that some characters are connected, and I couldn't find a way to separate them.

More image examples:

enter image description here

enter image description here

enter image description here

Do anyone have an approach to segment the image and get the characters?

like image 231
jonhkr Avatar asked Mar 20 '23 09:03

jonhkr


2 Answers

It seems to me that your characters have a maximum stroke width. Whenever you find a horizontal row of black pixels that is wider than this width, that indicates that two characters are joined here.

So

  • for each connected blob that is wider than a single character
    • for each row of that blob
      • find all uninterrupted horizontal lines of black pixels in this row that are longer than MAX_STROKE_WIDTH
      • note the X-coordinate of the center of these lines
    • cluster the found X-coordinates
    • split the block at the center of each cluster with more than N coordinates.
    • (you can either simply split vertically, or try to fit a line through the points in the cluster)
like image 110
HugoRune Avatar answered Mar 21 '23 22:03

HugoRune


Your approach seems to be too bottom-up to me.
The number of characters is constant and they seem to be mono-spaced
So just split the image by the width and make sure the features that you retrieve are rotational/scale invariant.
This can then be fed into an ANN. I don't see why you have to segment the characters.

like image 43
Vihari Piratla Avatar answered Mar 21 '23 22:03

Vihari Piratla