Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying the document language in Google Document AI API

I'm trying to parse a handwritten document with Google Cloud Document AI. The document contains Cyrillic characters, however Document AI occasionally detects words with Latin characters. Is there way to specify the language of the document, so it will try to recognize the words in particular language regardless of the confidence?

like image 200
Yuriy Chachora Avatar asked May 14 '26 04:05

Yuriy Chachora


2 Answers

These are the languages supported in Document AI.

Currently it's not possible to specify the language to recognize the words in a particular language in the document. It can only detect language.

If you want the feature to specify the language of the document to be implemented, you can open a new feature request on the issue tracker describing your requirement.

like image 135
Prajna Rai T Avatar answered May 17 '26 13:05

Prajna Rai T


There was a recent update to Document AI that supports the languageHints parameter, which allows you to specify a language. Note: This only works when using the v1beta3 endpoint with the Document OCR processor at this time.

If the language is supported, then provide the BCP-47 code for the language in the processOptions field when sending the processing request.

like image 36
Holt Skinner Avatar answered May 17 '26 14:05

Holt Skinner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!