Detect words and graphs in image and slice image into 1 image per word or graph

Tags:

I'm building a web app to help students with learning Maths.

The app needs to display Maths content that comes from LaTex files. These Latex files render (beautifully) to pdf that I can convert cleanly to svg thanks to pdf2svg.

The (svg or png or whatever image format) image looks something like this:

 _______________________________________
|                                       |
| 1. Word1 word2 word3 word4            |
|    a. Word5 word6 word7               |
|                                       |
|   ///////////Graph1///////////        |
|                                       |
|    b. Word8 word9 word10              |
|                                       |
| 2. Word11 word12 word13 word14        |
|                                       |
|_______________________________________|

Real example:

The web app intent is to manipulate and add content to this, leading to something like this:

 _______________________________________
|                                       |
| 1. Word1 word2                        | <-- New line break
|_______________________________________|
|                                       |
| -> NewContent1                        |  
|_______________________________________|
|                                       |
|   word3 word4                         |  
|_______________________________________|
|                                       |
| -> NewContent2                        |  
|_______________________________________|
|                                       |
|    a. Word5 word6 word7               |
|_______________________________________|
|                                       |
|   ///////////Graph1///////////        |
|_______________________________________|
|                                       |
| -> NewContent3                        |  
|_______________________________________|
|                                       |
|    b. Word8 word9 word10              |
|_______________________________________|
|                                       |
| 2. Word11 word12 word13 word14        |
|_______________________________________|

Example:

A large single image cannot give me the flexibility to do this kind of manipulations.

But if the image file was broken down into smaller files which hold single words and single Graphs I could do these manipulations.

What I think I need to do is detect whitespace in the image, and slice the image into multiple sub-images, looking something like this:

 _______________________________________
|          |       |       |            |
| 1. Word1 | word2 | word3 | word4      |
|__________|_______|_______|____________|
|             |       |                 |
|    a. Word5 | word6 | word7           |
|_____________|_______|_________________|
|                                       |
|   ///////////Graph1///////////        |
|_______________________________________|
|             |       |                 |
|    b. Word8 | word9 | word10          |
|_____________|_______|_________________|
|           |        |        |         |
| 2. Word11 | word12 | word13 | word14  |
|___________|________|________|_________|

I'm looking for a way to do this. What do you think is the way to go?

Thank you for your help!

810

asked Aug 19 '17 14:08

alphablend.dev

1 Answers

I would use horizontal and vertical projection to first segment the image into lines, and then each line into smaller slices (e.g. words).

Start by converting the image to grayscale, and then invert it, so that gaps contain zeros and any text/graphics are non-zero.

img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray

Calculate horizontal projection -- mean intensity per row, using cv2.reduce, and flatten it to a linear array.

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()

Now find the row ranges for all the contiguous gaps. You can use the function provided in this answer.

row_gaps = zero_runs(row_means)

Finally calculate the midpoints of the gaps, that we will use to cut the image up.

row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2

You end up with something like this situation (gaps are pink, cutpoints red):

Visualization of horizontal projection, gaps and cutpoints

Next step would be to process each identified line.

bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
    line = img[start:end]
    line_gray_inverted = img_gray_inverted[start:end]

Calculate the vertical projection (average intensity per column), find the gaps and cutpoints. Additionally, calculate gap sizes, to allow filtering out the small gaps between individual letters.

column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
column_gaps = zero_runs(column_means)
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2

Filter the cutpoints.

filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

And create a list of bounding boxes for each segment.

for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
    bounding_boxes.append(((xstart, start), (xend, end)))

Now you end up with something like this (again gaps are pink, cutpoints red):

Visualization of horizontal projection, gaps and cutpoints

Now you can cut up the image. I'll just visualize the bounding boxes found:

Visualization of bounding boxes

The full script:

import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec


def plot_horizontal_projection(file_name, img, projection):
    fig = plt.figure(1, figsize=(12,16))
    gs = gridspec.GridSpec(1, 2, width_ratios=[3,1])

    ax = plt.subplot(gs[0])
    im = ax.imshow(img, interpolation='nearest', aspect='auto')
    ax.grid(which='major', alpha=0.5)

    ax = plt.subplot(gs[1])
    ax.plot(projection, np.arange(img.shape[0]), 'm')
    ax.grid(which='major', alpha=0.5)
    plt.xlim([0.0, 255.0])
    plt.ylim([-0.5, img.shape[0] - 0.5])
    ax.invert_yaxis()

    fig.suptitle("FOO", fontsize=16)
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])  

    fig.set_dpi(200)

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
    plt.clf() 

def plot_vertical_projection(file_name, img, projection):
    fig = plt.figure(2, figsize=(12, 4))
    gs = gridspec.GridSpec(2, 1, height_ratios=[1,5])

    ax = plt.subplot(gs[0])
    im = ax.imshow(img, interpolation='nearest', aspect='auto')
    ax.grid(which='major', alpha=0.5)

    ax = plt.subplot(gs[1])
    ax.plot(np.arange(img.shape[1]), projection, 'm')
    ax.grid(which='major', alpha=0.5)
    plt.xlim([-0.5, img.shape[1] - 0.5])
    plt.ylim([0.0, 255.0])

    fig.suptitle("FOO", fontsize=16)
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])  

    fig.set_dpi(200)

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
    plt.clf() 

def visualize_hp(file_name, img, row_means, row_cutpoints):
    row_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    row_highlight[row_means == 0, :, :] = [255,191,191]
    row_highlight[row_cutpoints, :, :] = [255,0,0]
    plot_horizontal_projection(file_name, row_highlight, row_means)

def visualize_vp(file_name, img, column_means, column_cutpoints):
    col_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    col_highlight[:, column_means == 0, :] = [255,191,191]
    col_highlight[:, column_cutpoints, :] = [255,0,0]
    plot_vertical_projection(file_name, col_highlight, column_means)


# From https://stackoverflow.com/a/24892274/3962537
def zero_runs(a):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges


img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
row_gaps = zero_runs(row_means)
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2

visualize_hp("article_hp.png", img, row_means, row_cutpoints)

bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
    line = img[start:end]
    line_gray_inverted = img_gray_inverted[start:end]

    column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
    column_gaps = zero_runs(column_means)
    column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
    column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2

    filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

    for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
        bounding_boxes.append(((xstart, start), (xend, end)))

    visualize_vp("article_vp_%02d.png" % n, line, column_means, filtered_cutpoints)

result = img.copy()

for bounding_box in bounding_boxes:
    cv2.rectangle(result, bounding_box[0], bounding_box[1], (255,0,0), 2)

cv2.imwrite("article_boxes.png", result)

183

answered Sep 30 '22 16:09

Dan Mašek

Related questions
                            
                                opencv2: bicubic interpolation while resizing image
                            
                                OpenCV 3 KNN implementation
                            
                                cmake for OpenCV doesn't detect jdk
                            
                                findHomography not working in OpenCV 3.0
                            
                                Using OpenCV in eclipse
                            
                                Is it possible to zoom and focus using OpenCV on Android?
                            
                                Edge refinement OpenCV android
                            
                                How to know version of library found by CMake?
                            
                                What is the meaning of values in stage.xml and cascade.xml for opencv cascade classifier
                            
                                Is there a cheatsheet for OpenCV 3? what are key differences in v3?
                            
                                Coordinates of every point on a circle's circumference
                            
                                BeagleBone Black OpenCV Python is too slow
                            
                                android compare 2 images and highlight difference
                            
                                OpenCV : How to find the pixels inside a contour in c++
                            
                                How to choose Full Frames (Uncompressed) as codec for VideoWriter
                            
                                Getting mean of an image using a mask
                            
                                How to count the number of pixels with a certain pixel value in python opencv?
                            
                                python opencv error in Emacs when running cv2.Canny()
                            
                                Reading .exr files in OpenCV
                            
                                Unable to extract shape_predictor_68_face_landmarks.dat for bz

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detect words and graphs in image and slice image into 1 image per word or graph

Tags:

image-processing

opencv

whitespace

edge-detection

alphablend.dev

People also ask

1 Answers

Dan Mašek

Recent Activity

Donate For Us