Any hints on how to train a nn dependency parser on a new corpus?

Question

We'd like to train the Stanford NN dependency parser on a Russian corpus, are there any hints on how to do it? The hyper-parameters are described in the paper, however it would be nice to understand how to prepare the training data (Annotations, and specifically how to create word2vec annotations). Any help or a reference to some document is greatly appreciated!

Thanks!

StanfordNLPHelp · Accepted Answer

Here are some answers:

the site for word2vec if you want to build vector representations for Russian:

https://code.google.com/p/word2vec/
the dependencies need to be in the CoNLL-X format:

http://ilk.uvt.nl/conll/#dataformat

The word embeddings should be in this format (each word vector on its own line):

WORD n0 n1 n2 n3 n4 ...

for instance:

apple .45242 .392323 .111423 .999334

put your embeddings in a file called russian_embeddings.txt

the training command (assumes your word vectors have dimension=50)

java edu.stanford.nlp.parser.nndep.DependencyParser -tlp edu.stanford.nlp.trees.international.RussianTreebankLanguagePack -trainFile russian/train.conll -devFile russian/dev.conll -embedFile russian_embeddings.txt -embeddingSize 50 -model nndep.russian.model.txt.gz

A big complication is that as of the moment, edu.stanford.nlp.trees.international.RussianTreebankLanguagePack does not exist, so you will have to create this class and model it after the TreebankLanguagePacks for other languages ; if you look around in the package edu.stanford.nlp.trees.international , you can see what these TreebankLanguagePack files look like for other languages (note: the French one is only 143 lines long, so making a similar class for Russian is not out of the question at all) ; I will consult with other group members and see if I can get some clarity on what you'd have to do to complete this task
There are a lot of challenges to building this Russian NN dependency parse model. If you would like more help please let me know. I will talk to the developers of the NN parser and see if I can give you more advice, these answers are meant as a starting point!

Any hints on how to train a nn dependency parser on a new corpus?

Tags:

stanford-nlp

randomsurfer_123

1 Answers

StanfordNLPHelp

Recent Activity

Donate For Us

Any hints on how to train a nn dependency parser on a new corpus?

Tags:

stanford-nlp

randomsurfer_123

1 Answers

StanfordNLPHelp

Related questions

Recent Activity

Donate For Us