I am facing problem to detect named entities which starts with lowercase letter. If I train the model with only lowercase words, then the accuracy is reasonable; however, when the model is trained with fully uppercase tokens or even mix of lowercase and uppercase, the result is very bad. I tried some features which presented by the Stanford NLP Group Class NERFeatureFactory as well as variety of sentences, but I could not get the results that I expected. An example for the problem I am facing is as follow:
"ali studied at university of michigan and now he works for us navy."
I expected the model to recognize entities as follow:
If the .TSV file, which used as training data, contains ONLY lowercase letters, then I can get the above result otherwise the result is surprising.
Any help is highly appreciated a head.
If you have lowercase text or mixed case text, the accuracy can get affected as the Stanford NLP models are trained on standardly edited data, but there are a couple of useful ways to approach this problem:
You can read more here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With