I'm looking to perform optical character recognition (OCR) on a display, and want the program to work under different light conditions. To do this, I need to process and threshold the image such that there is no noise surrounding each digit, allowing me to detect the contour of the digit and perform OCR from there. I need the threshold value I use to be adaptable to these different light conditions. I've tried adaptive thresholding, but I haven't been able to get it to work.
My image processing is simple: load the image (i), grayscale i (g), apply a histogram equalization to g (h), and apply a binary threshold to h with a threshold value = t. I've worked with a couple of different datasets, and found that the optimal threshold value to make the OCR work consistently lies within the range of highest density in a histogram plot of (h) (the only part of the plot without gaps).
A histogram of (h). The values t=[190,220] are optimal for OCR. A more complete set of images describing my problem is available here: http://imgur.com/a/wRgi7
My current solution, which works but is clunky and slow, checks for:
1. There must be 3 digits
2. The first digit must be reasonably small in size
3. There must be at least one contour recognized as a digit
4. The digit must be recognized in the digit dictionary
Barring all cases being accepted, the threshold is increased by 10 (beginning at a low value) and an attempt is made again.
The fact that I can recognize the optimal threshold value on the histogram plot of (h) may just be confirmation bias, but I'd like to know if there's a way I can extract the value. This is different from how I've worked with histograms before, which has been more on finding peaks/valleys.
I'm using cv2 for image processing and matplotlib.pyplot for the histogram plots.
The midpoint method finds an appropriate threshold value in an iterative fashion. First, apply a reasonable initial threshold value. Then, compute the mean of the pixel values below and above this threshold, respectively. Finally, compute the mean of the two means and use this value as the new threshold value.
Thresholding can be computed by two techniques: global and adaptive. In global thresholding, a single value is treated as a global threshold for the entire image. These methods are constructed when the image consists of a bimodal histogram.
Like Otsu's Method and the Iterative Selection Thresholding Method, this is a histogram based thresholding method. This approach assumes that the image is divided in two main classes: The background and the foreground. The BHT method tries to find the optimum threshold level that divides the histogram in two classes.
Check this: link it really not depend on density, it works because you did separation of 2 maximums. Local maximums are main classes foreground - left local maximum (text pixels), and background right local maximum (white paper). Optimal threshold should optimally separate these maximums. And the optimal threshold value lies in local minimum region between two local maximums.
At first, I thought "well, just make a histogram of the indexes in which data appears" which would totally work, but I don't think that will actually solve your underlying work you want to do.
I think you're misinterpreting histogram equalization. What histogram equalization does is thins out the histogram in highly concentrated areas so that if you take different bin sizes with the histogram, you'll get more or less equal quantity inside the bins. The only reason those values are dense is specifically because they appear less in the image. Histogram equalization makes other, more popular values, appear less. And the reason that range works out well is, as you see in the original grayscale histogram, values between 190 and 220 are really close to where the image begins to get bright again; i.e., where there is a clear demarkation of bright values.
You can see the way equalizeHist
works directly by plotting histograms with different bin sizes. For example, here's looping over bin sizes from 3 to 20:
Edit: So just to be clear, what you want is this demarked area between the lower bump and the higher bump in your original histogram. You don't need to use equalized histograms for this. In fact, this is what Otsu thresholding (following Otsu's method) actually does: you assume the data follows a bimodal distribution, and find the point which clearly marks the point between the two distributions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With