Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting comic strip dialogue bubble regions in images

I have an grayscale image of a comic strip page that features several dialogue bubbles (=speech baloons, etc), that are enclosed areas with white background and solid black borders that contain text inside, i.e. something like that:

Sample comic strip image

I want to detect these regions and create a mask (binary is ok) that will cover all the inside regions of dialogue bubbles, i.e. something like:

Sample resulting mask image

The same image, mask overlaid, to be totally clear:

Sample image with transparent mask overlay

So, my basic idea of the algorithm was something like:

  1. Detect where the text is — plant at least one pixel in every bubble. Dilate these regions somewhat and apply threshold to get a better starting ground; I've done this part:

Text positions outlined

  1. Use a flood fill or some sort of graph traversal, starting from every white pixel detected as a pixel-inside-bubble on step 1, but working on initial image, flooding white pixels (which are supposed to be inside the bubble) and stopping on dark pixels (which are supposed to be borders or text).

  2. Use some sort of binary_closing operation to remove dark areas (i.e. regions that correspond to text) inside bubbles). This part works ok.

So far, steps 1 and 3 work, but I'm struggling with step 2. I'm currently working with scikit-image, and I don't see any ready-made algorithms like flood fill implemented there. Obviously, I can use something trivial like breadth-first traversal, basically as suggested here, but it's really slow when done in Python. I suspect that intricate morphology stuff like binary_erosion or generate_binary_structure in ndimage or scikit-image, but I struggle to understand all that morphology terminology and basically how do I implement such a custom flood fill with it (i.e. starting with step 1 image, working on original image and producing output to separate output image).

I'm open to any suggestions, including ones in OpenCV, etc.

like image 822
GreyCat Avatar asked Dec 18 '15 13:12

GreyCat


People also ask

How do you read a comic speech bubble?

Speech bubbles are read in a similar way to frames. We start by reading the highest one, and then the dialogue unspools from top to bottom. Once you have composed your page, it is a good idea to trace a line from one speech bubble to the next, in the order that you want people to read them.

What are the bubbles in comic strips called?

Speech balloons (also speech bubbles, dialogue balloons, or word balloons) are a graphic convention used most commonly in comic books, comics, and cartoons to allow words (and much less often, pictures) to be understood as representing a character's speech or thoughts.

Where do speech bubbles go in a comic?

Bubbles are placed on a page in a precise order. We always start by reading the bubble that is highest in the frame, then the next one down, and so on. When two or more frames are next to each other, we read them from left to right. The tip of the tail points to the character who is speaking.

What is a comic bubble?

The visual tool used to represent the speech, dialogue, or conversation of the characters in the comics is called “bubble”. The meaning of speech bubbles in comics will be addressed, with emphasis on their proper use and comics grammar.


1 Answers

Even though your actual question is concerning step 2 of your processing pipeline, I would like to suggest another approach, that might be, imho, simpler and as you stated that you are open to suggestions.

  1. Using the image from your original step 1 you could create an image without text in the bubbles.

    Implemented

  2. Detect edges on the original image with removed text. This should work well for the speech bubbles, as the bubble edges are pretty distinct.

    Edge detection

  3. Finally use the edge image and the initially detected "text locations" in order to find those areas within the edge image that contain text.

    Watershed-Segmentation

I am sorry for this very general answer, but here it's too late for actual coding for me, but if the question is still open and you need/want some more hints concerning my suggestion, I will elaborate it in more detail. But you could definitely have a look at the Region based segmentation in the scikit-image docs.

like image 75
Bubblbu Avatar answered Sep 22 '22 23:09

Bubblbu