Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect words and graphs in image and slice image into 1 image per word or graph

I'm building a web app to help students with learning Maths.

The app needs to display Maths content that comes from LaTex files. These Latex files render (beautifully) to pdf that I can convert cleanly to svg thanks to pdf2svg.

The (svg or png or whatever image format) image looks something like this:

 _______________________________________
|                                       |
| 1. Word1 word2 word3 word4            |
|    a. Word5 word6 word7               |
|                                       |
|   ///////////Graph1///////////        |
|                                       |
|    b. Word8 word9 word10              |
|                                       |
| 2. Word11 word12 word13 word14        |
|                                       |
|_______________________________________|

Real example:


The web app intent is to manipulate and add content to this, leading to something like this:

 _______________________________________
|                                       |
| 1. Word1 word2                        | <-- New line break
|_______________________________________|
|                                       |
| -> NewContent1                        |  
|_______________________________________|
|                                       |
|   word3 word4                         |  
|_______________________________________|
|                                       |
| -> NewContent2                        |  
|_______________________________________|
|                                       |
|    a. Word5 word6 word7               |
|_______________________________________|
|                                       |
|   ///////////Graph1///////////        |
|_______________________________________|
|                                       |
| -> NewContent3                        |  
|_______________________________________|
|                                       |
|    b. Word8 word9 word10              |
|_______________________________________|
|                                       |
| 2. Word11 word12 word13 word14        |
|_______________________________________|

Example:


A large single image cannot give me the flexibility to do this kind of manipulations.

But if the image file was broken down into smaller files which hold single words and single Graphs I could do these manipulations.

What I think I need to do is detect whitespace in the image, and slice the image into multiple sub-images, looking something like this:

 _______________________________________
|          |       |       |            |
| 1. Word1 | word2 | word3 | word4      |
|__________|_______|_______|____________|
|             |       |                 |
|    a. Word5 | word6 | word7           |
|_____________|_______|_________________|
|                                       |
|   ///////////Graph1///////////        |
|_______________________________________|
|             |       |                 |
|    b. Word8 | word9 | word10          |
|_____________|_______|_________________|
|           |        |        |         |
| 2. Word11 | word12 | word13 | word14  |
|___________|________|________|_________|

I'm looking for a way to do this. What do you think is the way to go?

Thank you for your help!

like image 810
alphablend.dev Avatar asked Aug 19 '17 14:08

alphablend.dev


People also ask

What kind of graph is best for showing change over time in a set of data?

. . . a Line graph. Line graphs are used to track changes over short and long periods of time. When smaller changes exist, line graphs are better to use than bar graphs. Line graphs can also be used to compare changes over the same period of time for more than one group.

Which graph to use for which data?

If you want to compare values, use a pie chart — for relative comparison — or bar charts — for precise comparison. If you want to compare volumes, use an area chart or a bubble chart. If you want to show trends and patterns in your data, use a line chart, bar chart, or scatter plot.

What statistical graph is best used to show the relative sizes of data?

A pie graph (sometimes called a pie chart) is used to show how an overall total is divided into parts. A circle represents a group as a whole. The slices of this circular “pie” show the relative sizes of subgroups.


1 Answers

I would use horizontal and vertical projection to first segment the image into lines, and then each line into smaller slices (e.g. words).

Start by converting the image to grayscale, and then invert it, so that gaps contain zeros and any text/graphics are non-zero.

img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray

Calculate horizontal projection -- mean intensity per row, using cv2.reduce, and flatten it to a linear array.

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()

Now find the row ranges for all the contiguous gaps. You can use the function provided in this answer.

row_gaps = zero_runs(row_means)

Finally calculate the midpoints of the gaps, that we will use to cut the image up.

row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2

You end up with something like this situation (gaps are pink, cutpoints red):

Visualization of horizontal projection, gaps and cutpoints


Next step would be to process each identified line.

bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
    line = img[start:end]
    line_gray_inverted = img_gray_inverted[start:end]

Calculate the vertical projection (average intensity per column), find the gaps and cutpoints. Additionally, calculate gap sizes, to allow filtering out the small gaps between individual letters.

column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
column_gaps = zero_runs(column_means)
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2

Filter the cutpoints.

filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

And create a list of bounding boxes for each segment.

for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
    bounding_boxes.append(((xstart, start), (xend, end)))

Now you end up with something like this (again gaps are pink, cutpoints red):

Visualization of horizontal projection, gaps and cutpoints


Now you can cut up the image. I'll just visualize the bounding boxes found:

Visualization of bounding boxes


The full script:

import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec


def plot_horizontal_projection(file_name, img, projection):
    fig = plt.figure(1, figsize=(12,16))
    gs = gridspec.GridSpec(1, 2, width_ratios=[3,1])

    ax = plt.subplot(gs[0])
    im = ax.imshow(img, interpolation='nearest', aspect='auto')
    ax.grid(which='major', alpha=0.5)

    ax = plt.subplot(gs[1])
    ax.plot(projection, np.arange(img.shape[0]), 'm')
    ax.grid(which='major', alpha=0.5)
    plt.xlim([0.0, 255.0])
    plt.ylim([-0.5, img.shape[0] - 0.5])
    ax.invert_yaxis()

    fig.suptitle("FOO", fontsize=16)
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])  

    fig.set_dpi(200)

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
    plt.clf() 

def plot_vertical_projection(file_name, img, projection):
    fig = plt.figure(2, figsize=(12, 4))
    gs = gridspec.GridSpec(2, 1, height_ratios=[1,5])

    ax = plt.subplot(gs[0])
    im = ax.imshow(img, interpolation='nearest', aspect='auto')
    ax.grid(which='major', alpha=0.5)

    ax = plt.subplot(gs[1])
    ax.plot(np.arange(img.shape[1]), projection, 'm')
    ax.grid(which='major', alpha=0.5)
    plt.xlim([-0.5, img.shape[1] - 0.5])
    plt.ylim([0.0, 255.0])

    fig.suptitle("FOO", fontsize=16)
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])  

    fig.set_dpi(200)

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
    plt.clf() 

def visualize_hp(file_name, img, row_means, row_cutpoints):
    row_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    row_highlight[row_means == 0, :, :] = [255,191,191]
    row_highlight[row_cutpoints, :, :] = [255,0,0]
    plot_horizontal_projection(file_name, row_highlight, row_means)

def visualize_vp(file_name, img, column_means, column_cutpoints):
    col_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    col_highlight[:, column_means == 0, :] = [255,191,191]
    col_highlight[:, column_cutpoints, :] = [255,0,0]
    plot_vertical_projection(file_name, col_highlight, column_means)


# From https://stackoverflow.com/a/24892274/3962537
def zero_runs(a):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges


img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
row_gaps = zero_runs(row_means)
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2

visualize_hp("article_hp.png", img, row_means, row_cutpoints)

bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
    line = img[start:end]
    line_gray_inverted = img_gray_inverted[start:end]

    column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
    column_gaps = zero_runs(column_means)
    column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
    column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2

    filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

    for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
        bounding_boxes.append(((xstart, start), (xend, end)))

    visualize_vp("article_vp_%02d.png" % n, line, column_means, filtered_cutpoints)

result = img.copy()

for bounding_box in bounding_boxes:
    cv2.rectangle(result, bounding_box[0], bounding_box[1], (255,0,0), 2)

cv2.imwrite("article_boxes.png", result)
like image 183
Dan Mašek Avatar answered Sep 30 '22 16:09

Dan Mašek