We'd like to train the Stanford NN dependency parser on a Russian corpus, are there any hints on how to do it? The hyper-parameters are described in the paper, however it would be nice to understand how to prepare the training data (Annotations, and specifically how to create word2vec annotations). Any help or a reference to some document is greatly appreciated!
Thanks!
Here are some answers:
the site for word2vec if you want to build vector representations for Russian:
https://code.google.com/p/word2vec/
the dependencies need to be in the CoNLL-X format:
http://ilk.uvt.nl/conll/#dataformat
The word embeddings should be in this format (each word vector on its own line):
WORD\tn0 n1 n2 n3 n4 ...
for instance:
apple .45242 .392323 .111423 .999334
put your embeddings in a file called russian_embeddings.txt
the training command (assumes your word vectors have dimension=50)
java edu.stanford.nlp.parser.nndep.DependencyParser -tlp edu.stanford.nlp.trees.international.RussianTreebankLanguagePack -trainFile russian/train.conll -devFile russian/dev.conll -embedFile russian_embeddings.txt -embeddingSize 50 -model nndep.russian.model.txt.gz
A big complication is that as of the moment, edu.stanford.nlp.trees.international.RussianTreebankLanguagePack does not exist, so you will have to create this class and model it after the TreebankLanguagePacks for other languages ; if you look around in the package edu.stanford.nlp.trees.international , you can see what these TreebankLanguagePack files look like for other languages (note: the French one is only 143 lines long, so making a similar class for Russian is not out of the question at all) ; I will consult with other group members and see if I can get some clarity on what you'd have to do to complete this task
There are a lot of challenges to building this Russian NN dependency parse model. If you would like more help please let me know. I will talk to the developers of the NN parser and see if I can give you more advice, these answers are meant as a starting point!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With