Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why spacy ner results are highly unpredictable?

I tried spacy for ner but the results are highly unpredictable.Sometimes spacy is not recognizing a particular country.Can anyone please explain why is it happening? I tried on some random sentences.

CASE 1:

nlp = spacy.load("en_core_web_sm")
print(nlp)
sent = "hello china hello japan"
doc = nlp(sent)
for i in doc.ents:
  print(i.text," ",i.label_)

OUTPUT:no output in this case.

CASE 2:

nlp = spacy.load("en_core_web_sm")
print(nlp)
sent = "china is a populous nation in East Asia whose vast landscape encompasses grassland, desert, mountains, lakes, rivers and more than 14,000km of coastline."
doc = nlp(sent)
for i in doc.ents:
  print(i.text," ",i.label_)

OUTPUT:

<spacy.lang.en.English object at 0x7f2213bde080>
china   GPE
East Asia   LOC
more than 14,000km   QUANTITY
like image 501
BALA Avatar asked Dec 09 '25 13:12

BALA


1 Answers

Natural Language models, like spaCy NER, learn from the contextual structure of the sentence (surrounding words). Why is that? Let's take the word Anwarvic as an example which is a new word that you haven't seen before and probably the spaCy model hasn't seen it before either. Let's see how the NER model is going to act when the provided sentence change:

  • "I love Anwarvic"
>>> nlp = spacy.load("en_core_web_sm")
>>> sent = "I love Anwarvic"
>>> doc = nlp(sent)
>>> for i in doc.ents:
...     print(i.text," ",i.label_)
Anwarvic   PERSON
  • "Anwarvic is gigantic"
>>> nlp = spacy.load("en_core_web_sm")
>>> sent = "Anwarvic is gigantic"
>>> doc = nlp(sent)
>>> for i in doc.ents:
...     print(i.text," ",i.label_)
Anwarvic   ORG
  • "Anwarvic is awesome"
>>> nlp = spacy.load("en_core_web_sm")
>>> sent = "Anwarvic is awesome"
>>> doc = nlp(sent)
>>> for i in doc.ents:
...     print(i.text," ",i.label_)

As we can see, the extracted entites vary when the contextual structure of Anwarvic varies. So, in the first sentece the verb love is very common with people. That's why spaCy model predicted it as a PERSON. And the same happens with the second sentence where we use gigantic to describe organizations like ORG. In the third sentece, awesome is a pretty generic adjective that can be used to describe basically anything. That's why the spaCy NER model was confused.

Sidenote

Actually, when I ran the first provided code on my machine, it extracts both china and japan like so:

china   GPE
japan   GPE
like image 180
Anwarvic Avatar answered Dec 12 '25 12:12

Anwarvic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!