Perhaps I've skipped over a part of the docs, but what I am trying to determine is a unique ID for each entity in the standard NER toolset. For example:
import spacy
from spacy import displacy
import en_core_web_sm
nlp = en_core_web_sm.load()
text = "This is a text about Apple Inc based in San Fransisco. "\
"And here is some text about Samsung Corp. "\
"Now, here is some more text about Apple and its products for customers in Norway"
doc = nlp(text)
for ent in doc.ents:
print('ID:{}\t{}\t"{}"\t'.format(ent.label,ent.label_,ent.text,))
displacy.render(doc, jupyter=True, style='ent')
returns:
ID:381 ORG "Apple Inc" ID:382 GPE "San Fransisco" ID:381 ORG "Samsung Corp." ID:381 ORG "Apple" ID:382 GPE "Norway"
I have been looking at ent.ent_id
and ent.ent_id_
but these are inactive according to the docs. I couldn't find anything in ent.root
either.
For example, in GCP NLP each entity is returned with an ⟨entity⟩number that enables you to identify multiple instances of the same entity within a text.
This is a ⟨text⟩2 about ⟨Apple Inc⟩1 based in ⟨San Fransisco⟩4. And here is some ⟨text⟩3 about ⟨Samsung Corp⟩6. Now, here is some more ⟨text⟩8 about ⟨Apple⟩1 and its ⟨products⟩5 for ⟨customers⟩7 in ⟨Norway⟩9"
Does spaCy support something similar? Or is there a way using NLTK or Stanford?
Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction.
Spacy comes with an extremely fast statistical entity recognition system that assigns labels to contiguous spans of tokens. Spacy provides an option to add arbitrary classes to entity recognition systems and update the model to even include the new examples apart from already defined entities within the model.
SpaCy provides an exceptionally efficient statistical system for NER in python, which can assign labels to groups of tokens which are contiguous. It provides a default model which can recognize a wide range of named or numerical entities, which include person, organization, language, event etc.
Re-train spacy NER with your custom examples: If you have, for instance, a few hundred examples with real addresses, you can manually TAG it and then re-train the spacy NER to overfit your particular address. You can train a new NER from scratch or fine-tune an existing one.
You can use neuralcoref library to get coreference resolution working with SpaCy's models as:
# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en')
# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)
# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'My sister has a dog. She loves him.')
doc._.has_coref
doc._.coref_clusters
Find the installation and usage instructions here: https://github.com/huggingface/neuralcoref
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With