Do you know of an algorithm that can see that there is handwriting on an image? I am not interested in knowing what the handwriting says, but only that there is one present?
I have a video of someone filling a slide with handwriting. My goal is to determine how much of the slide has been filled with handwriting already.
The video in question can be downloaded here: http://www.filedropper.com/00_6
For this particular video, a great solution was already suggested in Quantify how much a slide has been filled with handwriting
The solution is based on summing the amount of the specific color used for the handwriting. However, if the handwriting is not in blue but any other color that can also be found on non-handwriting, this approach will not work.
Therefore, I am interested to know, if there exists a more general solution to determine if there is handwriting present on an image?
What I have done so far: I was thinking of extracting the contours of an image, and then somehow detect the handwriting part based on how curvy the contours are (but I have no clue how to do that part). it might not be the best idea, though, as again it's not always correct...
import cv2
import matplotlib.pyplot as plt
img = cv2.imread(PATH TO IMAGE)
print("img shape=", img.shape)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow("image", gray)
cv2.waitKey(1)
#### extract all contours
# Find Canny edges
edged = cv2.Canny(gray, 30, 200)
cv2.waitKey(0)
# Finding Contours
# Use a copy of the image e.g. edged.copy()
# since findContours alters the image
contours, hierarchy = cv2.findContours(edged,
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cv2.imshow('Canny Edges After Contouring', edged)
cv2.waitKey(0)
print("Number of Contours found = " + str(len(contours)))
# Draw all contours
# -1 signifies drawing all contours
cv2.drawContours(img, contours, -1, (0, 255, 0), 3)
cv2.imshow('Contours', img)
cv2.waitKey(0)
You can identify the space taken by hand-writing by masking the pixels from the template, and then do the same for the difference between further frames and the template. You can use dilation, opening, and thresholding for this.
Let's start with your template. Let's identify the parts we will mask:
import cv2
import numpy as np
template = cv2.imread('template.jpg')
Now, let's broaden the occupied pixels to make a zone that we will mask (hide) later:
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)
Then, we will threshold to turn this into a black and white mask:
_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)
In later frames, we will subtract this mask from the picture, by turning all these pixels to white. For instance:
import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
frames = []
while count < 500:
frames.append(image)
success,image = vidcap.read()
count += 1
mask = np.where(thresh == 0)
example = frames[300]
example[mask] = [255, 255, 255]
cv2.imshow('', example)
cv2.waitKey(0)
Now, we will create a function that will return the difference between the template and a given picture. We will also use opening to get rid of the left over single pixels that would make it ugly.
def difference_with_mask(image):
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
_, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
thresh[mask] = 255
closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
return closing
cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)
To address the fact that you don't want to have the hand detected as hand-writing, I suggest that instead of using the mask for every individual frame, you use the 95th percentile of the 15 last 30th frame... hang on. Look at this:
results = []
for ix, frame in enumerate(frames):
if ix % 30 == 0:
history.append(frame)
results.append(np.quantile(history, 0.95, axis=0))
print(ix)
Now, the example frame becomes this (the hand is removed because it wasn't mostly present in the 15 last 30th frames):
As you can see a little part of the hand-writing is missing. It will come later, because of the time-dependent percentile transformation we're doing. You'll see later: in my example with frame 18,400, the text that is missing in the image above is present. Then, you can use the function I gave you and this will be the result:
And here we go! Note that this solution, which doesn't include the hand, will take longer to compute because there's a few calculations needing to be done. Using just an image with no regard to the hand would calculate instantly, to the extent that you could probably run it on your webcam feed in real time.
Final Example:
Here's the frame 18,400:
Final image:
You can play with the function if you want the mask to wrap more thinly around the text:
Full code:
import os
import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
from collections import deque
frames = deque(maxlen=700)
while count < 500:
frames.append(image)
success,image = vidcap.read()
count += 1
template = cv2.imread('template.jpg')
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)
cv2.imwrite('dilation.jpg', dilation)
cv2.imshow('', dilation)
cv2.waitKey(0)
_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)
cv2.imwrite('thresh.jpg', thresh)
cv2.imshow('', thresh)
cv2.waitKey(0)
mask = np.where(thresh == 0)
example = frames[400]
cv2.imwrite('original.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)
example[mask] = 255
cv2.imwrite('example_masked.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)
def difference_with_mask(image):
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
_, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
thresh[mask] = 255
closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
return closing
cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)
masked_example = difference_with_mask(frames[400])
cv2.imwrite('masked_example.jpg', masked_example)
from collections import deque
history = deque(maxlen=15)
results = []
for ix, frame in enumerate(frames):
if ix % 30 == 0:
history.append(frame)
results.append(np.quantile(history, 0.95, axis=0))
print(ix)
if ix > 500:
break
cv2.imshow('', frames[400])
cv2.waitKey(0)
cv2.imshow('', results[400].astype(np.uint8))
cv2.imwrite('percentiled_frame.jpg', results[400].astype(np.uint8))
cv2.waitKey(0)
cv2.imshow('', difference_with_mask(results[400].astype(np.uint8)))
cv2.imwrite('final.jpg', difference_with_mask(results[400].astype(np.uint8)))
cv2.waitKey(0)
You could try to make a template before detection which you could use to deduct it on the current frame of the video. One way you could make such a template is to iterate through every pixel of the frame and look-up if it has a higher value (white) in that coordinate than the value that is stored in the list.
Here is an example of such a template from your video by iterating through the first two seconds:
Once you have that it is simple to detect the text. You can use the cv2.absdiff()
function to make difference of template and frame. Here is an example:
Once you have this image it is trivial to search for writting (threshold + contour search or something similar).
Here is an example code:
import numpy as np
import cv2
cap = cv2.VideoCapture('0_0.mp4') # read video
bgr = cap.read()[1] # get first frame
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
template = frame.copy() # make a copy of the grayscale
h, w = frame.shape[:2] # height, width
matrix = [] # a list for [y, x] coordinares
# fill matrix with all coordinates of the image (height x width)
for j in range(h):
for i in range(w):
matrix.append([j, i])
fps = cap.get(cv2.CAP_PROP_FPS) # frames per second of the video
seconds = 2 # How many seconds of the video you wish to look the template for
k = seconds * fps # calculate how many frames of the video is in that many seconds
i = 0 # some iterator to count the frames
lowest = [] # list that will store highest values of each pixel on the fram - that will build our template
# store the value of the first frame - just so you can compare it in the next step
for j in matrix:
y = j[0]
x = j[1]
lowest.append(template[y, x])
# loop through the number of frames calculated before
while(i < k):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
# iterate through every pixel (pixels are located in the matrix)
for l, j in enumerate(matrix):
y = j[0] # x coordinate
x = j[1] # y coordinate
temp = template[y, x] # value of pixel in template
cur = frame[y, x] # value of pixel in the current frame
if cur > temp: # if the current frame has higher value change the value in the "lowest" list
lowest[l] = cur
i += 1 # increment the iterator
# just for vizualization
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
i = 0 # new iteratir to increment position in the "lowest" list
template = np.ones((h, w), dtype=np.uint8)*255 # new empty white image
# iterate through the matrix and change the value of the new empty white image to that value
# in the "lowest" list
for j in matrix:
template[j[0], j[1]] = lowest[i]
i += 1
# just for visualization - template
cv2.imwrite("template.png", template)
cv2.imshow("template", template)
cv2.waitKey(0)
cv2.destroyAllWindows()
counter = 0 # counter of countours: logicaly if the number of countours would
# rapidly decrease than that means that a new template is in order
mean_compare = 0 # this is needed for a simple color checker if the contour is
# the same color as the oders
# this is the difference between the frame of the video and created template
while(cap.isOpened()):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # grayscale
img = cv2.absdiff(template, frame) # resulted difference
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # thresholded image
kernel = np.ones((5, 5), dtype=np.uint8) # simple kernel
thresh = cv2.dilate(thresh, kernel, iterations=1) # dilate thresholded image
cnts, h = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # contour search
if len(cnts) < counter*0.5 and counter > 50: # check if new template is in order
# search for new template again
break
else:
counter = len(cnts) # update counter
for cnt in cnts: # iterate through contours
size = cv2.contourArea(cnt) # size of contours - to filter out noise
if 20 < size < 30000: # noise criterion
mask = np.zeros(frame.shape, np.uint8) # empry mask - needed for color compare
cv2.drawContours(mask, [cnt], -1, 255, -1) # draw contour on mask
mean = cv2.mean(bgr, mask=mask) # the mean color of the contour
if not mean_compare: # first will set the template color
mean_compare = mean
else:
k1 = 0.85 # koeficient how much each channels value in rgb image can be smaller
k2 = 1.15 # koeficient how much each channels value in rgb image can be bigger
# condition
b = bool(mean_compare[0] * k1 < mean[0] < mean_compare[0] * k2)
g = bool(mean_compare[1] * k1 < mean[1] < mean_compare[1] * k2)
r = bool(mean_compare[2] * k1 < mean[2] < mean_compare[2] * k2)
if b and g and r:
cv2.drawContours(bgr, [cnt], -1, (0, 255, 0), 2) # draw on rgb image
# just for visualization
cv2.imshow('img', bgr)
if cv2.waitKey(1) & 0xFF == ord('s'):
cv2.imwrite(str(j)+".png", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# release the video object and destroy window
cap.release()
cv2.destroyAllWindows()
One possible result with a simple size and color filter:
NOTE: This template search algorithm is very slow because of the nested loops and can probably be optimized to make it faster - you need a little more math knowledge than me. Also, you will need to make a check if the template changes in the same video - I'm guessing that shouldn't be too difficult.
A simpler idea on how to make it a bit faster is to resize the frames to let's say 20% and make the same template search. After that resize it back to the original and dilate the template. It will not be as nice of a result but it will make a mask on where the text and lines of the template are. Then simply draw it over the frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With