Stanford NER toolkit - lowercase entities recognition

Tags:

I am a newbie to NLP and trying to figure out how a Named Entity Recognizer annotates named entities. I am experimenting with Stanford NER toolkit. When I use the NER on standard more formal datasets where all naming conventions are followed to represent named entities such as in newswires or news blogs, the NER annotates the entities correctly. However when I run NER with informal datasets such as twitter, where named entities might not be capitalized as should have been, The NER does not annotate the entities. The classifier that I am using is a 3-CRF serialised classifer. Can anybody let me know how can I make the NER recognize lower case entities too?? Any useful suggestions on how to hack the NER and where this improvement is to be done is greatly appreciated. Thanks in advance for all your help.

277

asked Nov 20 '10 23:11

Anu

3 Answers

I know it is an old thread but hoping it will help someone. As christopher manning has replied, the way to get lowercase detected is to replace english.muc.7class.distsim.crf.ser.gz with english.muc.7class.caseless.distsim.crf.ser.gz that you can get when you unzip the core nlp caseless jar file.

For example, in my python file I have kept everything same except changing to the new file and it works perfectly (well, most of the time)

st = NERTagger('/Users/username/stanford-corenlp-python/stanford-ner-2014-10-26/classifiers/english.muc.7class.caseless.distsim.crf.ser.gz', '/Users/username/stanford-corenlp-python/stanford-ner-2014-10-26/stanford-ner.jar')

answered Nov 14 '22 23:11

Avi

I'm afraid there isn't an easy way to get the trained models we distribute to ignore case information at runtime. So, yes, they'll usually only label capitalized names. It would be possible to train a caseless model, which would work reasonably (but not as well on cased text, since case is a big clue in English (but not in German, Chinese, Arabic, etc.).

answered Nov 15 '22 01:11

Christopher Manning

Along with other people's suggestions. If you're using a feature-based classifier, I would definitely add in the 100-200 most common 3-4 letter substrings in people's names or making a gazzeteer under one recognized feature. There are certain patterns that are bound to show up quite a bit in personal names that don't show up very often in other types of words, like "eli."

answered Nov 15 '22 01:11

John Cadigan

Related questions
                            
                                java : use of executeQuery(string) method not supported error?
                            
                                Java: Parallelizing quick sort via multi-threading
                            
                                Is there a workaround for Composition and Marker Interfaces?
                            
                                3D Graphics Theory and Code without OpenGL, DirectX, XNA, et al [closed]
                            
                                JTable design to synchronize with back-end data-structure
                            
                                JTable: Buttons in Custom Panel in Cell
                            
                                Can removing final from a class definition break backwards compatibility?
                            
                                Threading UI updates in Android
                            
                                Align Swing Components across Panels
                            
                                Which Java database API is easiest to use?
                            
                                Groovy - reflection on a Java class - methods and parameters
                            
                                How was one method chosen over another in this code?
                            
                                Filter mapping for everything to Struts2 besides one servlet?
                            
                                "Cannot Find Symbol" compile error
                            
                                Will Hibernate flush my updated persistent object when calling session.close() with FlushMode.AUTO?
                            
                                OSS implementation of Google app engine?
                            
                                Call backs in Java (code explanation)
                            
                                eclipse: how to create an applet?
                            
                                Cannot get @Rollback to work for my Spring JPA Integration Test
                            
                                Compressing a byte array in Java and decompressing in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stanford NER toolkit - lowercase entities recognition

Tags:

java

stanford-nlp

named-entity-recognition

Anu

People also ask

3 Answers

Avi

Christopher Manning

John Cadigan

Recent Activity

Donate For Us