Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Annotating a Corpus (Syntaxnet)

I downloaded and installed SyntaxNet following Syntax official documentation on Github. following the documentation (annotating corpus) I tried to read a .conll file named wj.conll by SyntaxNet and write the results in wj-tagged.conll but I could not. My questions are:

  1. does SyntaxNet always reads .conll files? (not .txt files?). I got a bit confused as I knew SyntaxNet reads .conll file for training and testing process but I am a bit suspicious that it is necessary to convert a .txt file to .conll file in order to have their Part Of Speach and Dependancy Parsing.

  2. How can I make SyntaxNet reads from files (I tired all possible ways explain in GitHub documentation about SyntaxNet and It didn't work for me)

like image 832
Nazanin Tajik Avatar asked May 26 '16 15:05

Nazanin Tajik


1 Answers

Add these declaration lines to "context.pbtxt" at the end of the file. Here "inp" and "out" are the text files present in the root directory of syntexnet.

   input {
   name: 'inp_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'inp'
     }
   }
   input {
   name: 'out_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'out'
     }
   }

Add sentences to the "inp" file for which you want tagging to be done and specify them in shell the next time you run syntaxnet using --input and --output tags.

Just to help you a bit more I am pasting an example shell command.

bazel-bin/syntaxnet/parser_eval \
--input inp_file \
--output stdout-conll \
--model syntaxnet/models/parsey_mcparseface/tagger-params \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--hidden_layer_sizes 64 \
--arg_prefix brain_tagger \
--graph_builder structured \
--slim_model \
--batch_size 1024 | bazel-bin/syntaxnet/parser_eval \
--input stdout-conll  \
--output out_file \
--hidden_layer_sizes 512,512 \
--arg_prefix brain_parser \
--graph_builder structured \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--model_path syntaxnet/models/parsey_mcparseface/parser-params \
--slim_model --batch_size 1024

In the above script the output(POS tagging) of the first shell command is used as an input for the second shell command, where the two shell commands are seperated by "|"

like image 70
Harsh Patni Avatar answered Sep 21 '22 13:09

Harsh Patni