I'm busy with an OCR application in python to read digits. I'm using OpenCV to find the contours on an image, crop it, and then preprocess the image to 28x28 for the MNIST dataset. My images are not square, so I seem to lose a lot of quality when I resize the image. Any tips or suggestions I could try?
This is the original image
This is after editing it
And this is the quality it should be
I've tried some tricks from http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html , like Dilation and Opening. But it doesnt make it better, it only makes it vague...
This it the code im using (find contour,crop it, resize it, then threshold, and then i center it)
import numpy as np
import cv2
import imutils
import scipy
from imutils.perspective import four_point_transform
from scipy import ndimage
images = np.zeros((4, 784))
correct_vals = np.zeros((4, 10))
i = 0
def getBestShift(img):
cy, cx = ndimage.measurements.center_of_mass(img)
rows, cols = img.shape
shiftx = np.round(cols / 2.0 - cx).astype(int)
shifty = np.round(rows / 2.0 - cy).astype(int)
return shiftx, shifty
def shift(img, sx, sy):
rows, cols = img.shape
M = np.float32([[1, 0, sx], [0, 1, sy]])
shifted = cv2.warpAffine(img, M, (cols, rows))
return shifted
for no in [1, 3, 4, 5]:
image = cv2.imread("images/" + str(no) + ".jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 200, 255)
cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
displayCnt = None
for c in cnts:
# approximate the contour
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)
# if the contour has four vertices, then we have found
# the thermostat display
if len(approx) == 4:
displayCnt = approx
break
warped = four_point_transform(gray, displayCnt.reshape(4, 2))
gray = cv2.resize(255 - warped, (28, 28))
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
while np.sum(gray[0]) == 0:
gray = gray[1:]
while np.sum(gray[:, 0]) == 0:
gray = np.delete(gray, 0, 1)
while np.sum(gray[-1]) == 0:
gray = gray[:-1]
while np.sum(gray[:, -1]) == 0:
gray = np.delete(gray, -1, 1)
rows, cols = gray.shape
if rows > cols:
factor = 20.0 / rows
rows = 20
cols = int(round(cols * factor))
gray = cv2.resize(gray, (cols, rows))
else:
factor = 20.0 / cols
cols = 20
rows = int(round(rows * factor))
gray = cv2.resize(gray, (cols, rows))
colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')
shiftx, shifty = getBestShift(gray)
shifted = shift(gray, shiftx, shifty)
gray = shifted
cv2.imwrite("processed/" + str(no) + ".png", gray)
cv2.imshow("imgs", gray)
cv2.waitKey(0)
When you resize the image, make sure you select the interpolation that best suits your needs. For this, I recommend:
gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)
which results in after the rest of your processing.
You can see a comparison of methods here: http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/ but since there's just a handful, you can try them all out and see what gives the best results. It looks like the default is INTER_LINEAR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With