I am working on an OCR project whose goal is to read the stamped-in serial number off of a metal plate:
I am using OpenCV to prepare the image for OCR, and using Tesseract for the OCR itself. This is the ideal process:
My current process is this:
However, I am having very limited success. My main questions are:
I feel this isn't the complete solution may be but can help -
I have been working on a similar scenario where i wanted to extract text from embossed metal.
My approach is similar to your approach -
What i have noticed is Tesseract works better when the color of text is black and background is white.(So, i am doing the 7th step)
You can see the code and results of my work here - https://github.com/DevashishPrasad/Embossed-Text-Reader
And i would also like to mention that it all depends on canny and your image. You keep threshold values low to find more edges and high to find less edges. But more edges introduce noise in the image while less edges would fail to detect whole digit. So it all depends on the canny threshold values and your image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With