Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrong encoding of google cloud translate and Java

I'm trying to use Google cloud translate. I think the problem is that Google cloud translate use UTF8 and the jvm use UTF16. So i got some typo in translations. For instance :

      public static void main(String... args) throws Exception {
    // Instantiates a client
    Translate translate = TranslateOptions.getDefaultInstance().getService();

    // The text to translate
    String text = "Bonjour, à qui dois-je répondre? Non, C'est l'inverse...";

    // Translates some text into Russian
    Translation translation =
        translate.translate(
            text,
            TranslateOption.sourceLanguage("fr"),
            TranslateOption.targetLanguage("en"));


    System.out.printf("Text: %s%n", text);
    System.out.printf("Translation: %s%n", StringEscapeUtils.unescapeHtml(translation.getTranslatedText()));
  }

will return :

"Translation: Hello, who should I answer? No, it's the opposite ..."

instead of :

Translation: Hello, who should I answer? No, it's the opposite ...

We can't change the encoding of a java String, and the Google Cloud Api will not accept anything (Byte[]?) but String.

Do someone know how to fix it?

Thank you for reading

Edit : This code is now working, I added the StringEscapeUtils.unescapeHtml from commons.apache dependencies. I do not know if there is an other way to do it.

like image 420
DeepProblems Avatar asked Feb 15 '18 11:02

DeepProblems


People also ask

Is Google translate encrypted?

Google Translate offers no security or confidentiality for your data. Everything translated through Google Translate is stored and analysed by Google in accordance with their terms of service.

Does Google have a translation API?

The Google Translate API uses machine learning to decipher text and allows developers to easily integrate translation functionality into their website(s) or mobile app(s).


1 Answers

Even though you already found a solution to your problem, I do have another fix for your problem which does not require the use of an additional library.

The translate method returns a html encoded String by default as previously mentioned. But it can return a plain text String if the matching TranslateOption is given in the method call.

The method call will then look something like this.

    Translation translation = translate.translate(
            text,
            Translate.TranslateOption.sourceLanguage(from),
            Translate.TranslateOption.targetLanguage(to),
            Translate.TranslateOption.format("text")
    );
like image 115
nigood Avatar answered Oct 16 '22 18:10

nigood