I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.
Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”. NER is a form of natural language processing (NLP), a subfield of artificial intelligence.
There are two main models used to achieve this goal: Ontology-based models and Deep Learning-based models. Ontology-based Named Entity Recognition uses a knowledge-based recognition process that relies on lists of datasets, such as a list of company names for the company category, to make inferences.
To perform named entity recognition with NLTK, you have to perform three steps: Convert your text to tokens using the word_tokenize() function. Find parts of speech tag for each word using the pos_tag() function. Pass the list that contains tuples of words and POS tags to the ne_chunk() function.
In BERT, the id 101 is reserved for the special [CLS] token, the id 102 is reserved for the special [SEP] token, and the id 0 is reserved for [PAD] token. token_type_ids : To identify the sequence in which a token belongs to. Since we only have one sequence per text, then all the values of token_type_ids will be 0.
While Syntaxnet does not explicitly offer any Named Entity Recognition functionality, Parsey McParseface does part of speech tagging and produces the output as a Co-NLL table.
Any proper noun is tagged as NNP and I have found that a simple regex identifier like so: <NNP>+
i.e. one or more proper nouns put together, gives a fairly good yield of named entities within a document. It is of course rudimentary and rule-based but effective nonetheless.
In order to pipe the Co-NLL data to an output file from the demo.sh script (located in "/opt/tensorflow/models/syntaxnet/syntaxnet") comment out the section of the code that pipes it to conll2ascii.py so that the script looks like so:
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
$PARSER_EVAL \
--input=$INPUT_FORMAT \
--output=stdout-conll \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=stdin-conll \
--output=sample-param \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr
You will also notice that the output parameter was changed in the above file to sample-param. We will now set this. Make your way to the context.pbtxt file (located in "/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface") and create an input parameter to point to your output file. It should look something like so:
input {
name: 'sample-param'
record_format: 'conll-sentence'
Part {
file_pattern: "directory/prepoutput.txt"
}
}
Save and close the file and return to "/opt/tensorflow/models/syntaxnet" and run syntaxnet/demo.sh as given in the syntaxnet tutorial. On completion go to the specified output folder and you should have a table in co-nll format. You can then run a simple iterative program that goes over each entry and identifies the pos tags and based on this can try variations of my suggested format for entity recognition.
Hope this helped!
I have used GATE which is able to identify Named Entity Recognition and it does not required parsing NER. Although the part of speech tagger in SyntaxNet can identify noun, noun modifier and etc(which is more powerful tool for specifing different roles of name entities), I am not sure how fast it is going to perform in terms of identifying NERs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With