I have an grayscale image of a comic strip page that features several dialogue bubbles (=speech baloons, etc), that are enclosed areas with white background and solid black borders that contain text inside, i.e. something like that: <img src="https://i.stack.imgur.com/gIEXY.png" alt="Sample comic strip image"> I want to detect these regions and create a mask (binary is ok) that will cover all the inside regions of dialogue bubbles, i.e. something like: <img src="https://i.stack.imgur.com/bIbYp.png" alt="Sample resulting mask image"> The same image, mask overlaid, to be totally clear: <img src="https://i.stack.imgur.com/Ru9ip.png" alt="Sample image with transparent mask overlay"> So, my basic idea of the algorithm was something like: <ol> <li>Detect where the text is — plant at least one pixel in every bubble. Dilate these regions somewhat and apply threshold to get a better starting ground; I've done this part:</li> </ol> <img src="https://i.stack.imgur.com/YFRh9.png" alt="Text positions outlined"> <ol start="2"> <li>Use a flood fill or some sort of graph traversal, starting from every white pixel detected as a pixel-inside-bubble on step 1, but working on initial image, flooding white pixels (which are supposed to be inside the bubble) and stopping on dark pixels (which are supposed to be borders or text).</li> <li>Use some sort of binary_closing operation to remove dark areas (i.e. regions that correspond to text) inside bubbles). This part works ok.</li> </ol> So far, steps 1 and 3 work, but I'm struggling with step 2. I'm currently working with scikit-image, and I don't see any ready-made algorithms like flood fill implemented there. Obviously, I can use something trivial like breadth-first traversal, basically as suggested here, but it's really slow when done in Python. I suspect that intricate morphology stuff like binary_erosion or generate_binary_structure in ndimage or scikit-image, but I struggle to understand all that morphology terminology and basically how do I implement such a custom flood fill with it (i.e. starting with step 1 image, working on original image and producing output to separate output image). I'm open to any suggestions, including ones in OpenCV, etc.

Even though your actual question is concerning step 2 of your processing pipeline, I would like to suggest another approach, that might be, imho, simpler and as you stated that you are open to suggestions. <ol> <li> Using the image from your original step 1 you could create an image without text in the bubbles. Implemented </li> <li> Detect edges on the original image with removed text. This should work well for the speech bubbles, as the bubble edges are pretty distinct. Edge detection </li> <li> Finally use the edge image and the initially detected "text locations" in order to find those areas within the edge image that contain text. Watershed-Segmentation </li> </ol> I am sorry for this very general answer, but here it's too late for actual coding for me, but if the question is still open and you need/want some more hints concerning my suggestion, I will elaborate it in more detail. But you could definitely have a look at the Region based segmentation in the scikit-image docs.

Detecting comic strip dialogue bubble regions in images

Tags:

python

numpy

scipy

computer-vision

scikit-image

I have an grayscale image of a comic strip page that features several dialogue bubbles (=speech baloons, etc), that are enclosed areas with white background and solid black borders that contain text inside, i.e. something like that:

Sample comic strip image

I want to detect these regions and create a mask (binary is ok) that will cover all the inside regions of dialogue bubbles, i.e. something like:

Sample resulting mask image

The same image, mask overlaid, to be totally clear:

Sample image with transparent mask overlay

So, my basic idea of the algorithm was something like:

Detect where the text is — plant at least one pixel in every bubble. Dilate these regions somewhat and apply threshold to get a better starting ground; I've done this part:

Text positions outlined

Use a flood fill or some sort of graph traversal, starting from every white pixel detected as a pixel-inside-bubble on step 1, but working on initial image, flooding white pixels (which are supposed to be inside the bubble) and stopping on dark pixels (which are supposed to be borders or text).
Use some sort of binary_closing operation to remove dark areas (i.e. regions that correspond to text) inside bubbles). This part works ok.

So far, steps 1 and 3 work, but I'm struggling with step 2. I'm currently working with scikit-image, and I don't see any ready-made algorithms like flood fill implemented there. Obviously, I can use something trivial like breadth-first traversal, basically as suggested here, but it's really slow when done in Python. I suspect that intricate morphology stuff like binary_erosion or generate_binary_structure in ndimage or scikit-image, but I struggle to understand all that morphology terminology and basically how do I implement such a custom flood fill with it (i.e. starting with step 1 image, working on original image and producing output to separate output image).

I'm open to any suggestions, including ones in OpenCV, etc.

822

asked Dec 18 '15 13:12

GreyCat

1 Answers

Even though your actual question is concerning step 2 of your processing pipeline, I would like to suggest another approach, that might be, imho, simpler and as you stated that you are open to suggestions.

Using the image from your original step 1 you could create an image without text in the bubbles.

Implemented
Detect edges on the original image with removed text. This should work well for the speech bubbles, as the bubble edges are pretty distinct.

Edge detection
Finally use the edge image and the initially detected "text locations" in order to find those areas within the edge image that contain text.

Watershed-Segmentation

I am sorry for this very general answer, but here it's too late for actual coding for me, but if the question is still open and you need/want some more hints concerning my suggestion, I will elaborate it in more detail. But you could definitely have a look at the Region based segmentation in the scikit-image docs.

answered Sep 22 '22 23:09

Bubblbu

Related questions
                            
                                Use full page width with Brother P950NW
                            
                                CSV to Feather in Pandas with slicing Rows
                            
                                No output after using PyCUDA
                            
                                ERROR: Directory is not installable. Neither 'setup.py' nor 'pyproject.toml'
                            
                                Finding smallest eigenvectors of large sparse matrix, over 100x slower in SciPy than in Octave
                            
                                how to register more than 10 apps in Google App Engine
                            
                                what is wrong with c++ streams when using boost.python?
                            
                                How to detect gestures in OpenKinect (with python wrappers)
                            
                                How do I combine a timezone aware date and time in Python?
                            
                                Install PIL in Ubuntu 12.04 Python 2.7 and Python 3.2
                            
                                How to debug Django app running on Heroku using a remote pdb connection?
                            
                                ipython ipdb, when invoked via ipdb.set_trace(), does not remember the command history while debugging
                            
                                sqlalchemy validator for two fields
                            
                                Cython VS C++ Performance Comparison? [closed]
                            
                                How to document fortran function for f2py?
                            
                                Regarding installing SciPy from PyCharm
                            
                                Validation on query_params in Django Rest Framework
                            
                                numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
                            
                                Manually calling spark's garbage collection from pyspark
                            
                                Celery restart loss scheduled tasks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With