Named Entity Recognition with Syntaxnet

Tags:

I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.

599

asked Jun 29 '16 20:06

Anantha

2 Answers

While Syntaxnet does not explicitly offer any Named Entity Recognition functionality, Parsey McParseface does part of speech tagging and produces the output as a Co-NLL table.

Any proper noun is tagged as NNP and I have found that a simple regex identifier like so: <NNP>+ i.e. one or more proper nouns put together, gives a fairly good yield of named entities within a document. It is of course rudimentary and rule-based but effective nonetheless.

In order to pipe the Co-NLL data to an output file from the demo.sh script (located in "/opt/tensorflow/models/syntaxnet/syntaxnet") comment out the section of the code that pipes it to conll2ascii.py so that the script looks like so:

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
  --input=$INPUT_FORMAT \
  --output=stdout-conll \
  --hidden_layer_sizes=64 \
  --arg_prefix=brain_tagger \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/tagger-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \
   | \
  $PARSER_EVAL \
  --input=stdin-conll \
  --output=sample-param \
  --hidden_layer_sizes=512,512 \
  --arg_prefix=brain_parser \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/parser-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr

You will also notice that the output parameter was changed in the above file to sample-param. We will now set this. Make your way to the context.pbtxt file (located in "/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface") and create an input parameter to point to your output file. It should look something like so:

input {
  name: 'sample-param'
  record_format: 'conll-sentence'
  Part {
    file_pattern: "directory/prepoutput.txt"
  }
}

Save and close the file and return to "/opt/tensorflow/models/syntaxnet" and run syntaxnet/demo.sh as given in the syntaxnet tutorial. On completion go to the specified output folder and you should have a table in co-nll format. You can then run a simple iterative program that goes over each entry and identifies the pos tags and based on this can try variations of my suggested format for entity recognition.

Hope this helped!

answered Nov 15 '22 08:11

YashTD

I have used GATE which is able to identify Named Entity Recognition and it does not required parsing NER. Although the part of speech tagger in SyntaxNet can identify noun, noun modifier and etc(which is more powerful tool for specifing different roles of name entities), I am not sure how fast it is going to perform in terms of identifying NERs.

answered Nov 15 '22 09:11

Nazanin Tajik

Related questions
                            
                                NLTK - WordNet: list of long words
                            
                                How to extract Predicate and subject from a sentence using NLP Libraries?
                            
                                Using predict on new text with kmeans (sklearn)?
                            
                                NLP, spaCy: Strategy for improving document similarity
                            
                                How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?
                            
                                Extracting a person's age from unstructured text in Python
                            
                                How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
                            
                                How to Convert English to Cron?
                            
                                NLP algorithm to 'fill out' search terms
                            
                                python nltk keyword extraction from sentence
                            
                                most efficient edit distance to identify misspellings in names?
                            
                                Stanford CoreNLP sentiment
                            
                                Independent clause boundary disambiguation, and independent clause segmentation – any tools to do this?
                            
                                Can I control the way the CountVectorizer vectorizes the corpus in scikit learn?
                            
                                Updating the feature names into scikit TFIdfVectorizer
                            
                                Language detection API/Library [closed]
                            
                                Using scikit-learn to training an NLP log linear model for NER
                            
                                Training Tagger with Custom Tags in NLTK
                            
                                Why Stanford parser with nltk is not correctly parsing a sentence?
                            
                                Multi-Threaded NLP with Spacy pipe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Named Entity Recognition with Syntaxnet

Tags:

tensorflow

nlp

syntaxnet

Anantha

People also ask

2 Answers

YashTD

Nazanin Tajik

Recent Activity

Donate For Us