Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Vision API does not recognize single digits

I have a project that make use of Google Vision API DOCUMENT_TEXT_DETECTION in order to extract text from document images.

Often the API has troubles in recognizing single digits, as you can see in this image:

enter image description here

I suppose that the problem could be related to some algorithm of noise removal, that recognizes isolated single digits as noise. Is there a way to improve Vision response in these situations? (for example managing noise threshold or others parameters)

At other times Vision confuses digits with letters:

enter image description here

But if I specify as parameter languageHints = 'en' or 'mt' these digits are ignored by the ocr. Is there a way to force the recognition of digits or latin characters?

like image 296
Davide Biraghi Avatar asked Mar 20 '18 14:03

Davide Biraghi


People also ask

How can I identify the letters in a picture?

Optical character recognition (OCR) is a sort of image conversion that basically extracts text from a given image, a document photo, etc. Various applications and technologies, such as Adobe Acrobat and the ML-based tool, such as Tesseract OCR, have been developed to aid with this process.


1 Answers

Unfortunately I think the Vision API is optimized for both ends of the spectrum -- dense text (DOCUMENT_TEXT_DETECTION) on one end, and arbitrary bits of text (TEXT_DETECTION) on the other. As you noted in the comments, the regular TEXT_DETECTION works better for these stray single digits while DOCUMENT_TEXT_DETECTION works better overall.

As far as I've heard, there are no current plans to try to cover both of these in a single way, but it's possible that this could improve in the future.

I think there have been other requests to do more fine-tuning and hinting on what you're looking to detect (e.g., here and here), but this doesn't seem to be available yet. Perhaps in the future you'll be able to provide more hints on the format of the text that you're looking to find in images (e.g., phone numbers, single digits, etc).

like image 73
JJ Geewax Avatar answered Oct 17 '22 06:10

JJ Geewax