Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I do Train And Test step in Giza++?

In artificial intelligence methods we have two stages of training. These stages are data and testing.

In the training stage we give a huge amount of data to a system and we normally test it with smaller volume of data. Then we evaluate the output.

Now the question is can this training be done through the built in functionality embedded in GIZA++ or we should write a separate application for that?

If we should write a separate application can anybody help me by suggesting an already written application? Or a manual? Note: I want to have an alignment program not a statistical machine translation

I would prefer to train in Giza++ so I can test with unobserved data.

Thanks in advance.

like image 973
m-Abrontan Avatar asked Aug 29 '12 08:08

m-Abrontan


People also ask

Which method is used to generate training and testing data?

Train/Test is a method to measure the accuracy of your model. It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. You train the model using the training set.

What is meant by training set and test set?

training set—a subset to train a model. test set—a subset to test the trained model.

Why do we train and test data?

By using similar data for training and testing, you can minimize the effects of data discrepancies and better understand the characteristics of the model. After a model has been processed by using the training set, you test the model by making predictions against the test set.

What is the difference between train and test data?

The main difference between training data and testing data is that training data is the subset of original data that is used to train the machine learning model, whereas testing data is used to check the accuracy of the model. The training dataset is generally larger in size compared to the testing dataset.


1 Answers

This should be a good starting point to train a baseline mt system using Moses.
Normally GIZA++ is used for word-aligning your parallel corpus.
You'll need some other tools such as IRSTLM, SRILM, or KenLM for language model estimation.

like image 114
Mortaza Doulaty Avatar answered Oct 19 '22 19:10

Mortaza Doulaty