Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Index of an Entity in a Sentence in Spacy?

Tags:

python

nlp

spacy

I want to know if there is an elegant way to get the index of an Entity with respect to a Sentence. I know I can get the index of an Entity in a string using ent.start_char and ent.end_char, but that value is with respect to the entire string.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion. Apple just launched a new Credit Card.")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

I want the Entity Apple in both the sentences to point to start and end indexes 0 and 5 respectively. How can I do that?

like image 583
iCHAIT Avatar asked Aug 22 '19 10:08

iCHAIT


People also ask

What does NLP () do in spaCy?

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.

What does spaCy load (' en ') do?

Essentially, spacy. load() is a convenience wrapper that reads the pipeline's config. cfg , uses the language and pipeline information to construct a Language object, loads in the model data and weights, and returns it.

What is Orth in spaCy?

orth is simply an integer that indicates the index of the occurrence of the word that is kept in the spacy.

Which token attributes can be used to extract POS tags?

To obtain fine-grained POS tags, we could use the tag_ attribute.


1 Answers

You need to subtract the sentence start position from the entity start positions:

for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
#                                 ^^^^^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^^^^^^^^

Output:

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY
Apple 0 5 ORG
Credit Card 26 37 ORG
like image 187
Wiktor Stribiżew Avatar answered Nov 14 '22 23:11

Wiktor Stribiżew