Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named Entity Recognition with Syntaxnet

I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.

like image 599
Anantha Avatar asked Jun 29 '16 20:06

Anantha


People also ask

What is named entity recognition with example?

Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”. NER is a form of natural language processing (NLP), a subfield of artificial intelligence.

Which is best model for named entity recognition?

There are two main models used to achieve this goal: Ontology-based models and Deep Learning-based models. Ontology-based Named Entity Recognition uses a knowledge-based recognition process that relies on lists of datasets, such as a list of company names for the company category, to make inferences.

How do you do a named entity recognition using NLTK?

To perform named entity recognition with NLTK, you have to perform three steps: Convert your text to tokens using the word_tokenize() function. Find parts of speech tag for each word using the pos_tag() function. Pass the list that contains tuples of words and POS tags to the ne_chunk() function.

How do you use Bert for named entity recognition?

In BERT, the id 101 is reserved for the special [CLS] token, the id 102 is reserved for the special [SEP] token, and the id 0 is reserved for [PAD] token. token_type_ids : To identify the sequence in which a token belongs to. Since we only have one sequence per text, then all the values of token_type_ids will be 0.


2 Answers

While Syntaxnet does not explicitly offer any Named Entity Recognition functionality, Parsey McParseface does part of speech tagging and produces the output as a Co-NLL table.

Any proper noun is tagged as NNP and I have found that a simple regex identifier like so: <NNP>+ i.e. one or more proper nouns put together, gives a fairly good yield of named entities within a document. It is of course rudimentary and rule-based but effective nonetheless.

In order to pipe the Co-NLL data to an output file from the demo.sh script (located in "/opt/tensorflow/models/syntaxnet/syntaxnet") comment out the section of the code that pipes it to conll2ascii.py so that the script looks like so:

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
  --input=$INPUT_FORMAT \
  --output=stdout-conll \
  --hidden_layer_sizes=64 \
  --arg_prefix=brain_tagger \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/tagger-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \
   | \
  $PARSER_EVAL \
  --input=stdin-conll \
  --output=sample-param \
  --hidden_layer_sizes=512,512 \
  --arg_prefix=brain_parser \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/parser-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr

You will also notice that the output parameter was changed in the above file to sample-param. We will now set this. Make your way to the context.pbtxt file (located in "/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface") and create an input parameter to point to your output file. It should look something like so:

input {
  name: 'sample-param'
  record_format: 'conll-sentence'
  Part {
    file_pattern: "directory/prepoutput.txt"
  }
}

Save and close the file and return to "/opt/tensorflow/models/syntaxnet" and run syntaxnet/demo.sh as given in the syntaxnet tutorial. On completion go to the specified output folder and you should have a table in co-nll format. You can then run a simple iterative program that goes over each entry and identifies the pos tags and based on this can try variations of my suggested format for entity recognition.

Hope this helped!

like image 84
YashTD Avatar answered Nov 15 '22 08:11

YashTD


I have used GATE which is able to identify Named Entity Recognition and it does not required parsing NER. Although the part of speech tagger in SyntaxNet can identify noun, noun modifier and etc(which is more powerful tool for specifing different roles of name entities), I am not sure how fast it is going to perform in terms of identifying NERs.

like image 36
Nazanin Tajik Avatar answered Nov 15 '22 09:11

Nazanin Tajik