spaCy coreference resolution - named entity recognition (NER) to return unique entity ID's?

Tags:

Perhaps I've skipped over a part of the docs, but what I am trying to determine is a unique ID for each entity in the standard NER toolset. For example:

import spacy
from spacy import displacy
import en_core_web_sm
nlp = en_core_web_sm.load()

text = "This is a text about Apple Inc based in San Fransisco. "\
        "And here is some text about Samsung Corp. "\
        "Now, here is some more text about Apple and its products for customers in Norway"

doc = nlp(text)

for ent in doc.ents:
    print('ID:{}\t{}\t"{}"\t'.format(ent.label,ent.label_,ent.text,))


displacy.render(doc, jupyter=True, style='ent')

returns:

ID:381    ORG "Apple Inc" 
ID:382    GPE "San Fransisco" 
ID:381    ORG "Samsung Corp." 
ID:381    ORG "Apple" 
ID:382    GPE "Norway"

I have been looking at ent.ent_id and ent.ent_id_ but these are inactive according to the docs. I couldn't find anything in ent.root either.

For example, in GCP NLP each entity is returned with an ⟨entity⟩number that enables you to identify multiple instances of the same entity within a text.

This is a ⟨text⟩2 about ⟨Apple Inc⟩1 based in ⟨San Fransisco⟩4. And here is some ⟨text⟩3 about ⟨Samsung Corp⟩6. Now, here is some more ⟨text⟩8 about ⟨Apple⟩1 and its ⟨products⟩5 for ⟨customers⟩7 in ⟨Norway⟩9"

Does spaCy support something similar? Or is there a way using NLTK or Stanford?

482

asked Dec 12 '18 19:12

BenP

1 Answers

You can use neuralcoref library to get coreference resolution working with SpaCy's models as:

# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en')

# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)

# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'My sister has a dog. She loves him.')

doc._.has_coref
doc._.coref_clusters

Find the installation and usage instructions here: https://github.com/huggingface/neuralcoref

177

answered Sep 28 '22 17:09

scorp

Related questions
                            
                                PATH not updated correctly from conda activate in VSCode's terminal
                            
                                Image deformations in TensorFlow
                            
                                Django Serve Files From External Storage Directly
                            
                                scipy.curve_fit vs. numpy.polyfit different covariance matrices
                            
                                Is there a non-math version of matplotlib.ticker.LogFormatterSciNotation?
                            
                                Pylint: How do I cleanly suppress things without subsequent 'suppressed-message' nonsense?
                            
                                Matplotlib 3.0 with osx backend
                            
                                Unable to convert list into set, raises "unhashable type: 'list' " error
                            
                                PyDictionary word "has no Synonyms in the API"
                            
                                How to convert 3D RGB label image (in semantic segmentation) to 2D gray image, and class indices start from 0?
                            
                                MySQL Utilities with MySQL 8 Server
                            
                                Classmethods in Generic Protocols with self-types, mypy type checking failure
                            
                                How to always round up a XX.5 in numpy
                            
                                How to retrieve values from a function run in parallel processes?
                            
                                Python VS Code Debug - Capture SIGTERM?
                            
                                Flask session not persisting between requests for Angular App
                            
                                How to install Poppler to be used on AWS Lambda
                            
                                Linux Python Azure Function APP - pyodbc module not found despite being in requirements.txt and other modules working fine
                            
                                Is there a blank indentation in Python?
                            
                                PyTorch - applying attention efficiently

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

spaCy coreference resolution - named entity recognition (NER) to return unique entity ID's?

Tags:

python

nlp

information-extraction

named-entity-recognition

spacy

BenP

People also ask

1 Answers

scorp

Recent Activity

Donate For Us