Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open Source Machine Translation Engines?

We're looking for an open source Machine Translation Engine that could be incorporated into our localization workflow. We're looking at the options below:

  1. Moses (C++)
  2. Joshua (Java)
  3. Phrasal (Java)

Among these, Moses has the widest community support and has been tried out by many localization companies and researchers. We are actually leaning towards a Java-based engine since our applications are all in Java. Have any of you used either Joshua or Phrasal as part of your workflow. Could you please share your experiences with them? Or, is Moses way too far ahead of these in terms of the features it provides and ease of integration.

And, we require that the engine supports:

  1. Domain-specific training (i.e. it should maintain separate phrase tables for each domain that the input data belongs).
  2. Incremental training (i.e. avoiding having to retrain the model from scratch every time we wish to use some new training data).
  3. Parallelizing the translation process.
like image 686
Sam Avatar asked Jul 02 '12 23:07

Sam


1 Answers

This question is better asked on the Moses mailing list ([email protected]), I think. There are lots of people there working with different types of systems, so you'll get an objective answer. Apart from that, here's my input:

  • With respect to Java: it does not matter in which language the MT system is written. No offense, but you may safely assume that even if the code was written in a language you were familiar with, it would be too difficult to understand without a deeper knowledge of MT. So what you are looking for are interfaces. Moses's xml-rpc works fine.
  • With respect to MT systems: look for the best results, ignore the programming language it is written in. Results are here: matrix.statmt.org. The people using your MT system are interested in output not in your coding preferences.
  • With respect to the whole venture: once you start offering MT output, make sure you can adapt it quickly. MT is rapidly shifting towards a pipeline process in which an MT system is the core (and not the only) component. So focus on maintainability. In the ideal case, you would be able to connect any MT system to your framework.

And here's some input on your feature requests:

  • Domain-specific training: you don't need that feature. You get the best MT results by using customer specific data training.
  • Incremental training: see Stream Based Statistical Machine Translation
  • Parallelizing the translation process: you will have to implement this yourself. Note that most MT software is purely academic and will never reach a 1.0 milestone. It helps of course if a multi-threaded server is available (Moses), but even then, you will need lots of harnessing code.

Hope this helps. Feel free to PM me if you have any more questions.

like image 145
jvdbogae Avatar answered Nov 02 '22 06:11

jvdbogae