I have a project that make use of Google Vision API DOCUMENT_TEXT_DETECTION in order to extract text from document images.
Often the API has troubles in recognizing single digits, as you can see in this image:
I suppose that the problem could be related to some algorithm of noise removal, that recognizes isolated single digits as noise. Is there a way to improve Vision response in these situations? (for example managing noise threshold or others parameters)
At other times Vision confuses digits with letters:
But if I specify as parameter languageHints = 'en' or 'mt' these digits are ignored by the ocr. Is there a way to force the recognition of digits or latin characters?
Optical character recognition (OCR) is a sort of image conversion that basically extracts text from a given image, a document photo, etc. Various applications and technologies, such as Adobe Acrobat and the ML-based tool, such as Tesseract OCR, have been developed to aid with this process.
Unfortunately I think the Vision API is optimized for both ends of the spectrum -- dense text (DOCUMENT_TEXT_DETECTION
) on one end, and arbitrary bits of text (TEXT_DETECTION
) on the other. As you noted in the comments, the regular TEXT_DETECTION
works better for these stray single digits while DOCUMENT_TEXT_DETECTION
works better overall.
As far as I've heard, there are no current plans to try to cover both of these in a single way, but it's possible that this could improve in the future.
I think there have been other requests to do more fine-tuning and hinting on what you're looking to detect (e.g., here and here), but this doesn't seem to be available yet. Perhaps in the future you'll be able to provide more hints on the format of the text that you're looking to find in images (e.g., phone numbers, single digits, etc).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With