Stanford NER lowercase entities

Question

I am facing problem to detect named entities which starts with lowercase letter. If I train the model with only lowercase words, then the accuracy is reasonable; however, when the model is trained with fully uppercase tokens or even mix of lowercase and uppercase, the result is very bad. I tried some features which presented by the Stanford NLP Group Class NERFeatureFactory as well as variety of sentences, but I could not get the results that I expected. An example for the problem I am facing is as follow:

"ali studied at university of michigan and now he works for us navy."

I expected the model to recognize entities as follow:

"university" : "FACILITY",
"of michigan" : "FACILITY",
"ali" : "PERSON"
"us" : "ORGANIZATION"
"navy" : "ORGANIZATION"

If the .TSV file, which used as training data, contains ONLY lowercase letters, then I can get the above result otherwise the result is surprising.

Any help is highly appreciated a head.

Avinash Hindupur · Accepted Answer

If you have lowercase text or mixed case text, the accuracy can get affected as the Stanford NLP models are trained on standardly edited data, but there are a couple of useful ways to approach this problem:

One way is to correctly capitalize the text with a true case annotator, and then process the resulting text with the regular NER model.
Another way is to explore caseless models including ones that are available as part of Stanford NER.

You can read more here.

Stanford NER lowercase entities

Tags:

nlp

stanford-nlp

named-entity-recognition

Ali.E

1 Answers

Avinash Hindupur

Recent Activity

Donate For Us

Stanford NER lowercase entities

Tags:

nlp

stanford-nlp

named-entity-recognition

Ali.E

1 Answers

Avinash Hindupur

Related questions

Recent Activity

Donate For Us