Natural Language Processing - Word Alignment

Tags:

I am looking for word alignment tools and algorithms.
I am dealing with bilingual English - Hindi text, and currently working on

DTW (Dynamic Time Warping) algorithm
CLA (Competitive Linking Algorithm)
NATools
Giza++

Could you please suggest any other algorithm/tool which is language independent and which could achieve Statistical word alignment for parallel English Hindi Corpora and its evaluation.
Some tools are best for certain languages; could you please tell me how true that is and, if so, could you please provide an example of what would be better suited for Asian languages like Hindi. Counter-examples of what one shouldn't I use for such languages is also welcome.

I have heard a bit about Uplug word aligner... Could someone tell me if this tool is useful for my purpose.

Thank you.. :)

790

asked Mar 11 '10 14:03

boddhisattva

2 Answers

The Berkeley Aligner is very good. By doing joint training of the IBM word alignment models, it's able to get a much lower alignment error rate (AER) than older packages like GIZA++.

It also supports some more advanced features such as syntactic distortion (i.e., using parse tree information to get better alignments). For this, you'll only need parse trees for one of the language pairs. So, you should be okay doing Hindi<->English, since there are plenty of freely available and good English parsers.

If you decide not to go with the Berkeley Aligner, you should probably just use GIZA++. For years, it has been essentially the standard word aligner in the machine translation community.

answered Oct 09 '22 05:10

dmcer

Uplug is a great tool, I have been using it for aligning English<->Macedonian texts. It essentially builds on the Giza++ by adding the so-called clue alignments. It's advanced setting actually combines the the clue alignments and Giza++ and performs 3 such iterations. The more clues (pos-tags, lemmas ... ) you provide better the results will be. But I have to mention that you should not expect to get fundamentally different results then by just using Giza++.

Anyway, if you plan to seriously study the topic of SMT, I suggest that you read the paper (phd thesis) about Uplug, it will be very beneficial for you.

answered Oct 09 '22 06:10

msaveski

Related questions
                            
                                gcc memory alignment pragma
                            
                                Break line after input without html markup
                            
                                cannot center table in firefox
                            
                                Android Adding Buttons to Toolbar Programmatically
                            
                                Centering An Inline-Block DIV
                            
                                UITextField: Make it higher and position input text vertically centered
                            
                                How can I left align the text in an Angular Material stretched md-button?
                            
                                How to place a button in the center of div of dynamic size
                            
                                How to Align 3 buttons in a line, android?
                            
                                How do I position views in Interface Builder for both 3.5” and 4” Retina?
                            
                                How to align the image source within the ImageButton?
                            
                                CSS - Line height property, how it works (simple)
                            
                                How to dynamically create controls aligned to the top but after other aligned controls?
                            
                                center table in HTML
                            
                                uintptr_t portable alternative
                            
                                std::align and std::aligned_storage for aligned allocation of memory blocks
                            
                                preventing unaligned data on the heap
                            
                                Make the background-color of a <div> fill the enclosing <td>
                            
                                Align 2 texts, 1 normal, 1 opposite
                            
                                MIPS fetch address not aligned on word boundary, used .align 4, still no go

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Natural Language Processing - Word Alignment

Tags:

alignment

nlp

linguistics

boddhisattva

People also ask

2 Answers

dmcer

msaveski

Recent Activity

Donate For Us