Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How determine if a token is part of an entity within Spacy?

Tags:

python

spacy

I have

import spacy
nlp = spacy.load("en_core_web_lg")
line = "Rio de Janeiro is the capital of.."
doc = nlp(line)
for tok in doc:
    printf(tok.lemma_)
for ent in doc.ents:
    printf(e.lemma_)

I want obtain wikization: "[[Rio de Janeiro]] [[be|is]] [[the]] [[capital]] [[of]].." how determine if token "Rio" is part of entity "Rio de Janeiro"?

like image 487
Saku Avatar asked Aug 06 '20 11:08

Saku


1 Answers

Use the ent_type or ent_type_ attribute, if the value is not an empty string it is an entity.

Edit: for attribute ent_iob or ent_iob_ “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.

 import spacy
    nlp = spacy.load("en_core_web_lg")
    line = "Rio de Janeiro is the capital of.."
    doc = nlp(line)
    for tok in doc:
        print(tok, tok.ent_type_, tok.ent_iob_)

Output:

Rio GPE B
de GPE I
Janeiro GPE I
is  O
the  O
capital  O
of  O
..  O
like image 93
thorntonc Avatar answered Oct 21 '22 11:10

thorntonc