Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic text translation

What tools or web services are available for machine text translation.

For example

ENGLISH TEXT > SERVER or LIB > GERMAN TEXT

Libraries are also acceptable.

Is Google language API the only one ?

like image 629
Mite Mitreski Avatar asked Sep 03 '10 18:09

Mite Mitreski


1 Answers

Actually, just going with Google's translation API is probably the best and easiest thing to do.

Google's API is easy to use and, depending on the language pairs being translated, their translation system is either as good or much better than everything else.

Open Source Translation Packages

However, there are also some really good open source tools for machine translation. State-of-the-art packages include:

  • cdec (C++)
  • Joshua (Java)
  • Moses (C++)
  • Phrasal (Java) - soon to be released

Unlike translation APIs, you can use these tools without needing access to the Internet. More importantly, you can use these tools without running into any throttling or limits that the free APIs impose if you are trying to translate larger amounts of data.

Training Data

To use the open source machine translation packages, you'll need training data. If you're translating between English to German, or between some other European languages, you can use Phillip Koehn's Europarl parallel corpus.

If you're interested in a European Union (EU) language that's not in the Europarl parallel corpus, you can gather the data by crawling the proceedings of the European parliament. All the EU proceedings are translated into each of the EU languages and made available for free online, which makes them a very good source of machine translation training data.

like image 137
dmcer Avatar answered Oct 16 '22 13:10

dmcer