Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K-Fold Cross Validation for Naive Bayes Classifier

I had created a classifier using nltk, it will classify the reviews to 3 classes pos, neg and neu.

def get_feature(word):
    return dict([(word, True)])

def bag_of_words(words):
    return dict([(word, True) for word in words])

def create_training_dict(text, sense):
    ''' returns a dict ready for a classifier's test method '''
    tokens = extract_words(text)
    return [(bag_of_words(tokens), sense)]

def get_train_set(texts):
    train_set = []
    for words, sense in texts:
        train_set = train_set + [(get_feature(word), sense) for word in words]
    return train_set

doc_bow.append((top_tfidf,polarity))

train_set = get_train_set(doc_bow)
classifier = NaiveBayesClassifier.train(train_set)

decision = classifier.classify(tokens)

Now, I want to do a 10-fold cross validation to test the classifier. I found an example from sklearn.

from sklearn import cross_validation
from sklearn.naive_bayes import MultinomialNB

target = np.array( [x[0] for x in train_set] )
train = np.array( [x[1:] for x in train_set] )
cfr = MultinomialNB()

#Simple K-Fold cross validation. 10 folds.
cv = cross_validation.KFold(len(train_set), k=10, indices=False)
results = []
for traincv, testcv in cv:
    probas = cfr.fit(train[traincv], target[traincv]).predict_proba(train[testcv])
    results.append( myEvaluationFunc(target[testcv], [x[1] for x in probas]) )
print "Results: " + str( np.array(results).mean() )

I am getting this error

raise ValueError("Input X must be non-negative.")
ValueError: Input X must be non-negative.

I not sure the parameter I pass in it is correct or not.

like image 886
user236501 Avatar asked Oct 22 '22 11:10

user236501


1 Answers

MultinomialNB is intended for use with non-negative feature values.

Did you try GaussianNB?

like image 137
user1756896 Avatar answered Nov 02 '22 07:11

user1756896