I have
import spacy
nlp = spacy.load("en_core_web_lg")
line = "Rio de Janeiro is the capital of.."
doc = nlp(line)
for tok in doc:
printf(tok.lemma_)
for ent in doc.ents:
printf(e.lemma_)
I want obtain wikization: "[[Rio de Janeiro]] [[be|is]] [[the]] [[capital]] [[of]].." how determine if token "Rio" is part of entity "Rio de Janeiro"?
Use the ent_type
or ent_type_
attribute, if the value is not an empty string it is an entity.
Edit: for attribute ent_iob
or ent_iob_
“B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.
import spacy
nlp = spacy.load("en_core_web_lg")
line = "Rio de Janeiro is the capital of.."
doc = nlp(line)
for tok in doc:
print(tok, tok.ent_type_, tok.ent_iob_)
Output:
Rio GPE B
de GPE I
Janeiro GPE I
is O
the O
capital O
of O
.. O
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With