So I'm trying to create a program that can see what number an image is and print the integer in the console. (I'm using python 3)
For example that the program recognizes that the following image (an actual image the program has to check) is number 2:
I've tried to just compare it with an other image with the 2 in it with cv2.matchTemplate()
but each time the blue pixels rgb values are a little bit different for each image and the image could be a bit larger or smaller. for example the following image:
It also has to recognize it apart from al the other blue number images (0-9), for example the following one:
I've tried mulitple match template codes, and make a folder with number 0-9 images as templates, but each time almost every single number is recognized in the number that needs to be recognized. for example number 5 gets recognized in an image that is number 2. And if its doesnt recognize all of them, it recognizes the wrong one(s).
The ones I've tried:
but like I said before it comes with those problems.
I've also tried to see how much percentage blue is in each image, but those numbers were to close to tell the numbers appart by seeing how much blue was in them.
Does anyone have a solution? Am I being stupid for using cv2.matchTemplate()
and is there a much simpler option? (I don't mind using a library for it, because this is part of a bigger piece of code, but I prefer to code it, instead of libraries)
To style the logs, you should place %c within the first argument of console. log(). It will pick up the next argument as a CSS style for the “%c” pattern argument text.
To change the Foreground Color of text, use the Console. ForegroundColor property in C#.
Instead of using Template Matching, a better approach is to use Pytesseract OCR to read the number with image_to_string()
. But before performing OCR, you need to preprocess the image. For optimal OCR performance, the preprocessed image should have the desired text/number/characters to OCR in black with the background in white. A simple preprocessing step is to convert the image to grayscale, Otsu's threshold to obtain a binary image, then invert the image. Here's a visualization of the preprocessing step:
Input image ->
Grayscale ->
Otsu's threshold ->
Inverted image ready for OCR
Result from Pytesseract OCR
2
Here's the results with the other images:
2
5
We use the --psm 6
configuration option to assume a single uniform block of text. See here for more configuration options.
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, then invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
Note: If you insist on using Template Matching, you need to use scale variant template matching. Take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image for some examples. If you know for certain that your images are blue, then another approach would be to use color thresholding with cv2.inRange()
to obtain a binary mask image then apply OCR on the image.
Given the lovely regular input, I expect that all you need is simple comparison to templates. Since you neglected to supply your code and output, it's hard to tell what might have gone wrong.
Very simply ...
You might also want to set a lower threshold for declaring a match, perhaps based on how well that template matches each of the other templates: any identification has to clearly exceed the match between two different templates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With