Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to train a naive bayes classifier with pos-tag sequence as a feature?

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier with pos-tag as a feature? I know in python NaiveBayesClassifier allows for building a NB classifier but it uses contains-a-word as feature but can it be extended to use pos-tag-sequence as a feature ?

like image 853
kundan Avatar asked Feb 27 '15 11:02

kundan


People also ask

How do I train Naive Bayes classifier?

Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability. Step 4: See which class has a higher probability, given the input belongs to the higher probability class.

What does the Naive Bayes classifier assume the features are?

It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

How HMM is used for POS tagging explain in detail?

Use of HMM for POS Tagging The POS tagging process is the process of finding the sequence of tags which is most likely to have generated a given word sequence. We can model this POS process by using a Hidden Markov Model (HMM), where tags are the hidden states that produced the observable output, i.e., the words.

What is CRF for POS tagging?

A CRF is a sequence modeling algorithm which is used to identify entities or patterns in text, such as POS tags. This model not only assumes that features are dependent on each other, but also considers future observations while learning a pattern.


1 Answers

If you know how to train and predict texts (or sentences in your case) using nltk's naive bayes classifier and words as features, than you can easily extend this approach in order to classify texts by pos-tags. This is because the classifier don't care about whether your feature-strings are words or tags. So you can simply replace the words of your sentences by pos-tags using for example nltk's standard pos tagger:

sent = ['So', 'they', 'have', 'internet', 'on', 'computers' , 'now']
tags = [t for w, t in nltk.pos_tag(sent)]
print tags

['IN', 'PRP', 'VBP', 'JJ', 'IN', 'NNS', 'RB']

As from now you can proceed with the "contains-a-word" approach.

like image 194
char bugs Avatar answered Sep 21 '22 03:09

char bugs