In artificial intelligence methods we have two stages of training. These stages are data and testing.
In the training stage we give a huge amount of data to a system and we normally test it with smaller volume of data. Then we evaluate the output.
Now the question is can this training be done through the built in functionality embedded in GIZA++ or we should write a separate application for that?
If we should write a separate application can anybody help me by suggesting an already written application? Or a manual? Note: I want to have an alignment program not a statistical machine translation
I would prefer to train in Giza++ so I can test with unobserved data.
Thanks in advance.
Train/Test is a method to measure the accuracy of your model. It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. You train the model using the training set.
training set—a subset to train a model. test set—a subset to test the trained model.
By using similar data for training and testing, you can minimize the effects of data discrepancies and better understand the characteristics of the model. After a model has been processed by using the training set, you test the model by making predictions against the test set.
The main difference between training data and testing data is that training data is the subset of original data that is used to train the machine learning model, whereas testing data is used to check the accuracy of the model. The training dataset is generally larger in size compared to the testing dataset.
This should be a good starting point to train a baseline mt system using Moses.
Normally GIZA++ is used for word-aligning your parallel corpus.
You'll need some other tools such as IRSTLM, SRILM, or KenLM for language model estimation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With