I am searching for an offline, open source translator/API. The translation must not be good, just good enough for analyzing key-words.
I am writing a firefox plug-in. The plug-in computes the likelihood, that a website is trash. It works like a spamfilter, by counting 'evil' words. It works for English, but I am a German and the German language is more difficult for a computer. My idea is to 'simplify' the language by translating it into English and analyze the English text. There was the Google Translate API, but now you have to pay for it. I know that there are other programs to translate websites:
https://stackoverflow.com/questions/6151668/alternative-to-google-translate-api
They all have one problem. You send the html-code of a website to a server, the server translate the text for you and sends it back. This increases traffic and slows down. The owner of the server also won't like you.
Thats why I am searching for an offline, open source translator. The translation must not be good, just good enough for analyzing key-words. Just using a dictionary and translating word by word won't work.
Example: 'Ich bringe Dich um' means 'I kill you'. If you translate it word by word, you'll get: 'I bring you around.' This translation sounds weird, but not evil. The problem is that 'bringe ... um' is just one word.
I want to avoid German because of nasty problems like this (and there are much more problems: 23 different rules for plural etc.). I think the programmer who invented automatic translation already faced and solved these problems.
Apache Joshua (Incubating) Home may be the solution for you.
Just have to download the language pack you want, and run it as a server. As they say:
A key feature is that there are no dependencies (apart from Java 8). Getting a machine translation system running on your own machine is as easy as downloading the tarball, unpacking it, and running the included shell script.
All you have to do next, is to make a web query for obtaining translations (localhost:5674/translate?meta=list_weights&q=cifra+inferior+a+lo+que+predec%C3%ADan+las+encuestas+%2C+que+pronosticaban+de+mas+del+60+%25+de+participaci%C3%B3n+electoral+.&q=yo+quiero+taco+bell
), and you will obtain the response with the translated text as a JSON.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With