I want to recognize digits from a credit card. To make things worse, the source image is not guaranteed to be of high quality. The OCR is to be realized through a neural network, but that shouldn't be the topic here.
The current issue is the image preprocessing. As credit cards can have backgrounds and other complex graphics, the text is not as clear as with scanning a document. I made experiments with edge detection (Canny Edge, Sobel), but it wasn't that successful. Also calculating the difference between the greyscale image and a blurred one (as stated at Remove background color in image processing for OCR) did not lead to an OCRable result.
I think most approaches fail because the contrast between a specific digit and its background is not strong enough. There is probably a need to do a segmentation of the image into blocks and find the best preprocessing solution for each block?
Do you have any suggestions how to convert the source to a readable binary image? Is edge detection the way to go or should I stick with basic color thresholding?
Here is a sample of a greyscale-thresholding approach (where I am obviously not happy with the results):
Original image:
Greyscale image:
Thresholded image:
Thanks for any advice, Valentin
The following results are presented for Tesseract: the original set of samples achieves a precision of 0.907 and 0.901 recall rate, while the preprocessed set leads to a precision of 0.929 and a recall of 0.928.
If it's at all possible, request that better lighting be used to capture the images. A low-angle light would illuminate the edges of the raised (or sunken) characters, thus greatly improving the image quality. If the image is meant to be analyzed by a machine, then the lighting should be optimized for machine readability.
That said, one algorithm you should look into is the Stroke Width Transform, which is used to extract characters from natural images.
Stroke Width Transform (SWT) implementation (Java, C#...)
A global threshold (for binarization or clipping edge strengths) probably won't cut it for this application, and instead you should look at localized thresholds. In your example images the "02" following the "31" is particularly weak, so searching for the strongest local edges in that region would be better than filtering all edges in the character string using a single threshold.
If you can identify partial segments of characters, then you might use some directional morphology operations to help join segments. For example, if you have two nearly horizontal segments like the following, where 0 is the background and 1 is the foreground...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0
then you could perform a morphological "close" operation along the horizontal direction only to join those segments. The kernel could be something like
x x x x x
1 1 1 1 1
x x x x x
There are more sophisticated methods to perform curve completion using Bezier fits or even Euler spirals (a.k.a. clothoids), but preprocessing to identify segments to be joined and postprocessing to eliminate poor joins can get very tricky.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With