Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stanford NER lowercase entities

I am facing problem to detect named entities which starts with lowercase letter. If I train the model with only lowercase words, then the accuracy is reasonable; however, when the model is trained with fully uppercase tokens or even mix of lowercase and uppercase, the result is very bad. I tried some features which presented by the Stanford NLP Group Class NERFeatureFactory as well as variety of sentences, but I could not get the results that I expected. An example for the problem I am facing is as follow:

"ali studied at university of michigan and now he works for us navy."

I expected the model to recognize entities as follow:

  • "university" : "FACILITY",
  • "of michigan" : "FACILITY",
  • "ali" : "PERSON"
  • "us" : "ORGANIZATION"
  • "navy" : "ORGANIZATION"

If the .TSV file, which used as training data, contains ONLY lowercase letters, then I can get the above result otherwise the result is surprising.

Any help is highly appreciated a head.

like image 977
Ali.E Avatar asked Oct 29 '22 10:10

Ali.E


1 Answers

If you have lowercase text or mixed case text, the accuracy can get affected as the Stanford NLP models are trained on standardly edited data, but there are a couple of useful ways to approach this problem:

  1. One way is to correctly capitalize the text with a true case annotator, and then process the resulting text with the regular NER model.
  2. Another way is to explore caseless models including ones that are available as part of Stanford NER.

You can read more here.

like image 119
Avinash Hindupur Avatar answered Nov 09 '22 16:11

Avinash Hindupur