I'm trying to find a good high level explanation of how statistical machine translation works. That is, supposing I have a corpus of non-aligned English, French and German texts, how could I use that to translate any sentence from one language to another ? It's not that I'm looking to build a Google Translate myself, but I'd like to understand how it works in more detail.
I've seen searched Google but come across nothing good, it either quickly needs advanced mathematics knowledge to understand or is way too generalized. Wikipedia's article on SMT seems to be both, so it doesn't really help much. I'm skeptical that this is such a complex area that it's simply not possible to understand without all the mathematics.
Can anyone give, or know of, a general step-by-step explanation of how such a system works, targeted towards programmers (so code examples are fine) but without needing a mathematics degree to understand ? Or a book that's like this would be great too.
Edit: A perfect example of what I'm looking for would be an SMT equivalent to Peter Norvig's great article on spelling correction. That gives a good idea of what it's involved in writing a spell checker, without going into detailed maths on Levenshtein/soundex/smoothing algorithms etc...
Here is a nice video lecture (in 2 parts):
http://videolectures.net/aerfaiss08_koehn_pbfs/
For in-depth details, I highly advise this book:
http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/0521874157
Both are from the guy who created the most widely used MT system in research. It covers all the fundamental stuff, is very well explained and accurate. This probably one of the de-facto standard books that any researcher beginning in this field should read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With