Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace entity with its label in SpaCy

Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook.

I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple".

I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be "I am eating an FRUITS while playing with my Apple Macbook."

If I simply use regex, it will replace the second "Apple" with "FRUITS" as well, which is incorrect. Is there any smart way to do this?

Thanks!

like image 657
eng2019 Avatar asked Nov 05 '19 13:11

eng2019


2 Answers

the entity label is an attribute of the token (see here)

import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')

s = "His friend Nicolas is here."
doc = nlp(s)

print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', '.']

print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here .

Edit:

In order to handle cases were entities can span several words the following code can be used instead:

s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
newString = s
for e in reversed(doc.ents): #reversed to not modify the offsets of other entities when substituting
    start = e.start_char
    end = start + len(e.text)
    newString = newString[:start] + e.label_ + newString[end:]
print(newString)
#His friend PERSON is here with PERSON and PERSON.

Update:

Jinhua Wang brought to my attention that there is now a more built-in and simpler way to do this using the merge_entities pipe. See Jinhua's answer below.

like image 110
DBaker Avatar answered Sep 21 '22 03:09

DBaker


A more elegant modification to @DBaker's solution above when entities can span several words:

import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
nlp.add_pipe("merge_entities")

s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)

print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', 'with', 'PERSON', 'and', 'PERSON', '.']

print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here with PERSON and PERSON .

You can check the documentation on Spacy here. It uses the built in Pipeline for the job and has good support for multiprocessing. I believe this is the officially supported way to replace entities by their tags.

like image 32
Jinhua Wang Avatar answered Sep 23 '22 03:09

Jinhua Wang