How are StanfordNER Classifiers built

Question

I am working with StanfordNER classifiers. There are 4 classifiers as

english.all.3class.distsim.crf.ser.gz
english.muc.7class.distsim.crf.ser.gz
english.conll.4class.distsim.crf.ser.gz
example.serialized.ncc.ncc.ser.gz

How are these classifiers built? Since each of them is based on a different corpus, here is my guess

Train a machine learning classifier like SVM coupled with OVR (for multi label case) on the corpus to detect entities like ORGANIZATION,PERSON,LOCATION etc. This means that the training data would be the entire text of a document in the corpus. For that piece of text we explicitly indicate the ORGANIZATIONs,PERSONs and LOCATIONs. Thus the classifiers would be able to predict those entities.
Train a machine learning classifier to link POS tags with entities like ORGANIZATION,PERSON,LOCATION. For example, a classifier can be trained to predict which proper nouns should be ORGANIZATION

Is this the correct big picture? I am just trying to work out how to build my own NER.

Christopher Manning · Accepted Answer

Yes, the models are trained on supervised data. They're 1st order CRFs which do multi-class probabilistic sequence classification (so not OVR, not SVM). You can find an introduction to NER and Stanford NER in particular on the Stanford NER page.

How are StanfordNER Classifiers built

Tags:

machine-learning

classification

nlp

stanford-nlp

named-entity-recognition

AbtPst

1 Answers

Christopher Manning

Recent Activity

Donate For Us

How are StanfordNER Classifiers built

Tags:

machine-learning

classification

nlp

stanford-nlp

named-entity-recognition

AbtPst

1 Answers

Christopher Manning

Related questions

Recent Activity

Donate For Us