Detect horizontal blank lines in .pdf form image with OpenCV

Question

I have .pdf files that have been converted to .jpg images for this project. My goal is to identify the blanks (e.g ____________) that you would generally find in a .pdf form that indicate a space for the user to sign of fill out some kind of information. I have been using edge detection with the cv2.Canny() and cv2.HoughlinesP() functions.

This works fairly well, but there are quite a few false positives that come about from seemingly nowhere. When I look at the 'edges' file it shows a bunch of noise around the other words. I'm uncertain where this noise comes from.

Should I continue to tweak the parameters, or is there a better method to find the location of these blanks?

nathancy · Accepted Answer

Assuming that you're trying to find horizontal lines on a .pdf form, here's a simple approach:

Convert image to grayscale and adaptive threshold image
Construct special kernel to detect only horizontal lines
Perform morphological transformations
Find contours and draw onto image

Using this example image

Convert to grayscale and adaptive threshold to obtain a binary image

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Then we create a kernel with cv2.getStructuringElement() and perform morphological transformations to isolate horizontal lines

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

From here we can use cv2.HoughLinesP() to detect lines but since we have already preprocessed the image and isolated the horizontal lines, we can just find contours and draw the result

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

Full code

import cv2

image = cv2.imread('2.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()

Detect horizontal blank lines in .pdf form image with OpenCV

Tags:

python

image

image-processing

opencv

computer-vision

Jacob Ferraiolo

1 Answers

nathancy

Recent Activity

Donate For Us

Detect horizontal blank lines in .pdf form image with OpenCV

Tags:

python

image

image-processing

opencv

computer-vision

Jacob Ferraiolo

1 Answers

nathancy

Related questions

Recent Activity

Donate For Us