Named entity recognition (NER) features

Tags:

I'm new to Named Entity Recognition and I'm having some trouble understanding what/how features are used for this task.

Some papers I've read so far mention features used, but don't really explain them, for example in Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition, the following features are mentioned:

Main features used by the the sixteen systems that participated in the CoNLL-2003 shared task sorted by performance on the English test data. Aff: affix information (n-grams); bag: bag of words; cas: global case information; chu: chunk tags; doc: global document information; gaz: gazetteers; lex: lexical features; ort: orthographic information; pat: orthographic patterns (like Aa0); pos: part-of-speech tags; pre: previously predicted NE tags; quo: flag signing that the word is between quotes; tri: trigger words.

I'm a bit confused by some of these, however. For example:

isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?
how can POS tags exactly be used as features ? Don't we have a POS tag for each word? Isn't each object/instance a "text"?
what is global document information?
what is the feature trigger words?

I think all I need here is to just to look at an example table with each of these features as columns and see their values to understand how they really work, but so far I've failed to find an easy to read dataset.

Could someone please clarify or point me to some explanation or example of these features being used?

939

asked Feb 02 '17 12:02

Mr. Phil

1 Answers

Here's a shot at some answers (and by the way the terminology on all this stuff is super overloaded).

isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?

In my experience BOW Feature Extraction is used to produce word features out of sentences. So IMO BOW is not one feature, it is a method of generating features out of a sentence (or a block of text you are using). Uning NGrams can help with accounting for sequence, but BOW features amount to unordered bags of strings.

how can POS tags exactly be used as features ? Don't we have a POS tag for each word?

POS Tags are used as features because they can help with "word sense disambiguation" (at least on a theoretical level). For instance, the word "May" can be a name of a person or a month of a year or a poorly capitalized conjugated verb, but the POS tag can be the feature that differentiates that fact. And yes, you can get a POS tag for each word, but unless you explicitly use those tags in your "feature space" then the words themselves have no idea what they are in terms of their POS.

Isn't each object/instance a "text"?

If you mean what I think you mean, then this is true only if you have extracted object-instance "pairs" and stored them as features (an array of them derived from a string of tokens).

what is global document information?

I perceive this one to mean as such: Most NLP tasks function on a sentence. Global document information is data from all the surrounding text in the entire document. For instance, if you are trying to extract geographic placenames but disambiguate them, and you find the word Paris, which one is it? Well if France is mentioned 5 sentences above, that could increase the likelihood of it being Paris France rather than Paris Texas or worst case, the person Paris Hilton. It's also really important in what is called "coreference resolution", which is when you correlate a name to a pronoun reference (mapping a name mention to "he" or "she" etc).

what is the feature trigger words?

Trigger words are specific tokens or sequences that have high reliability as a stand alone thing to have a specific meaning. For instance, in sentiment analysis, curse words with exclamation marks often indicate negativity. There can be many permutations of this.

Anyway, my answers here are not perfect, and are prone to all manner of problems in human epistemology and inter-subjectivity, but those are the way I've been thinking about this things over the years I've been trying to solve problems with NLP.

Hopefully someone else will chime in, especially if I'm way off.

answered Oct 13 '22 12:10

Mark Giaconia

Related questions
                            
                                Gaussian process with 2D feature array as input - scikit-learn
                            
                                Keras Model with Maxpooling1D and channel_first
                            
                                AWS Sagemaker SKlearn entry point allow multiple script
                            
                                Fine-tuning and transfer learning by the example of YOLO
                            
                                Is it possible to use a collection of hyperspectral 1x1 pixels in a CNN model purposed for more conventional datasets (CIFAR-10/MNIST)?
                            
                                Is DLIB a good open source library for developing my own machine learning algorithms in C++?
                            
                                Simplest feature selection algorithm
                            
                                Using LIBSVM to predict authenticity of the user
                            
                                How to create the best Interactive R Language Online Learning Platform from the views of R community?
                            
                                Locally weighted logistic regression
                            
                                Multi-Label Document Classification
                            
                                What is a good metric for feature vector comparison and how to normalize them before comparison?
                            
                                How to do machine learning when the inputs are of different sizes?
                            
                                Whats the difference between Cross-Entropy and Genetic Algorithms?
                            
                                How to train a machine learning algorithm using MFCC coefficient vectors?
                            
                                Dropconnect in Tensorflow
                            
                                Multiple Linear Regression Model by using Tensorflow
                            
                                Storing and using a trained neural network
                            
                                Python - tf-idf predict a new document similarity
                            
                                How can I use R to get confidence intervals in Azure ML? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Named entity recognition (NER) features

Tags:

machine-learning

classification

nlp

feature-selection

named-entity-recognition

Mr. Phil

People also ask

1 Answers

Mark Giaconia

Recent Activity

Donate For Us