I'm making an automated text recognition script with Python on Ubuntu.
I'm using Gocr and the recognition render is too low.
Exemple:
Output: _O4_4E34E_4_O4_
I suppose that the type in the image is too bold, so I'm asking if there is a way to make it thinner using an python library or a linux command.
Tesseract does various image processing operations internally (using the Leptonica library) before doing the actual OCR. It generally does a very good job of this, but there will inevitably be cases where it isn't good enough, which can result in a significant reduction in accuracy.
You probably will need to apply a morphological operation like "erosion" on your image, e.g by using OpenCV. This will make structures thinner. To the cost of the quality, though.
Look here: https://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With