Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good explanation of statistical machine translation?

I'm trying to find a good high level explanation of how statistical machine translation works. That is, supposing I have a corpus of non-aligned English, French and German texts, how could I use that to translate any sentence from one language to another ? It's not that I'm looking to build a Google Translate myself, but I'd like to understand how it works in more detail.

I've seen searched Google but come across nothing good, it either quickly needs advanced mathematics knowledge to understand or is way too generalized. Wikipedia's article on SMT seems to be both, so it doesn't really help much. I'm skeptical that this is such a complex area that it's simply not possible to understand without all the mathematics.

Can anyone give, or know of, a general step-by-step explanation of how such a system works, targeted towards programmers (so code examples are fine) but without needing a mathematics degree to understand ? Or a book that's like this would be great too.

Edit: A perfect example of what I'm looking for would be an SMT equivalent to Peter Norvig's great article on spelling correction. That gives a good idea of what it's involved in writing a spell checker, without going into detailed maths on Levenshtein/soundex/smoothing algorithms etc...

like image 270
Michael Low Avatar asked Apr 28 '11 07:04

Michael Low


1 Answers

Here is a nice video lecture (in 2 parts):

http://videolectures.net/aerfaiss08_koehn_pbfs/

For in-depth details, I highly advise this book:

http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/0521874157

Both are from the guy who created the most widely used MT system in research. It covers all the fundamental stuff, is very well explained and accurate. This probably one of the de-facto standard books that any researcher beginning in this field should read.

like image 195
dagnelies Avatar answered Sep 22 '22 16:09

dagnelies