Getting weird markup from Google translate like ~~POS=TRUNC

Tags:

google-translate

I'm suddenly getting same strange markup when translating phrases in Google Translate API via the Java library. Examples for English → Swedish include:

Vector graphics → vektor~~POS=TRUNC grafikk~~POS=HEADCOMP

Javascript → Javascript script~~POS=HEADCOMP

It looks like it's related to compound noun handling. Is this a feature of the API that I can deactivate somehow or is this a new bug on the server side?

862

asked Nov 09 '16 11:11

2 Answers

This looks like a bug in the server-side translator. I also get it on the web site, https://translate.google.com/#view=home&op=translate&sl=ru&tl=no&text=%D0%9E%D0%B1%D1%89%D0%B5%D0%B6%D0%B8%D1%82%D0%B8%D0%B5 gives me vandrer~~POS=TRUNC.

In NLP, "POS" means Part-Of-Speech, "HEADCOMP" sounds like it could be the head of a noun-compound, I'm guessing they TRUNCate the non-head parts of compounds (practically never inflected). So Google Translate is spilling some of its internals. What's surprising is that such tags are the staple of rule-based/knowledge-based systems, whereas Google typically only does pure machine learning methods, shunning hard-coded knowledge. _{(One possibility is that they used a noun-compound analyser to expand their training set (which they then ran ML on, similar to how Systran & Koehn trained statistical MT on a parallel corpus translated with a rule-based MT system), but had a bug in the script to clean up the tags before training.)}

It'd be fun to find out which system they used, in case it was an open source one, but unfortunately the tags are practically ungoogleable, since the web is now littered with spammy machine translated (and non-post-edited) pages full of those tags.

108

answered Oct 18 '22 22:10

unhammer

It seems it has to do with the way Google "translates" strings, returning what is statistically most likely correct. Common Unix commands might therefor end up in your translation.

More discussion about the topic: https://www.reddit.com/r/German/comments/47kfah/thanks_google/

answered Oct 19 '22 00:10

theBigBadBacon

Related questions
                            
                                PyInstaller file fails to execute script - DistributionNotFound
                            
                                Python script to translate via google translate
                            
                                "google translate" vs "translate api"
                            
                                How to get data off of a character
                            
                                Redirected to captcha when accessing google translate
                            
                                Usage of Google Translate API in Android
                            
                                How to get all the meanings when I use Google translate API
                            
                                how to add google translate on goldendict? [closed]
                            
                                Prevent Google Translator from changing height of html to 100%
                            
                                Google translate, placeholder text in input type='text'
                            
                                Javascript Promises: Iterate over all object keys arrays and then resolve
                            
                                Google Translate: Callback when a language is selected
                            
                                Google Translate API authentication key and usage
                            
                                How can I prevent Google Translate from changing the html structure of my page?
                            
                                Google translate get current language
                            
                                Google translate api timeout
                            
                                How to get and parse json answer from google translate
                            
                                Unable to preserve line breaks in Google Translate response
                            
                                JavaScript/jQuery - Get text and translate it
                            
                                Why is Google Cloud API trying to connect as an end-user?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With