Simple example using BernoulliNB (naive bayes classifier) scikit-learn in python - cannot explain classification

Question

Using scikit-learn 0.10

Why does the following trivial code snippet:

from sklearn.naive_bayes import *

import sklearn
from sklearn.naive_bayes import *

print sklearn.__version__

X = np.array([ [1, 1, 1, 1, 1], 
               [0, 0, 0, 0, 0] ])
print "X: ", X
Y = np.array([ 1, 2 ])
print "Y: ", Y

clf = BernoulliNB()
clf.fit(X, Y)
print "Prediction:", clf.predict( [0, 0, 0, 0, 0] )

Print out an answer of "1" ? Having trained the model on [0,0,0,0,0] => 2 I was expecting "2" as the answer.

And why does replacing Y with

Y = np.array([ 3, 2 ])

Give a different class "2" as an answer (the correct one) ? Isn't this just a class label?

Can someone shed some light on this?

Andreas Mueller · Accepted Answer

By default, alpha, the smoothing parameter is one. As msw said, your training set is very small. Due to the smoothing, no information is left. If you set alpha to a very small value, you should see the result you expected.

msw · Answer

Your training set is too small as can be shown by

clf.predict_proba(X)

which yields

array([[ 0.5,  0.5],
       [ 0.5,  0.5]])

which shows that the classifier views all classifications as equiprobable. Compare with the sample shown in the documentation for BernoulliNB for which predict_proba() yields:

array([[ 2.71828146,  1.00000008,  1.00000004,  1.00000002,  1.        ],
       [ 1.00000006,  2.7182802 ,  1.00000004,  1.00000042,  1.00000007],
       [ 1.00000003,  1.00000005,  2.71828149,  1.        ,  1.00000003],
       [ 1.00000371,  1.00000794,  1.00000008,  2.71824811,  1.00000068],
       [ 1.00000007,  1.0000028 ,  1.00000149,  2.71822455,  1.00001671],
       [ 1.        ,  1.00000007,  1.00000003,  1.00000027,  2.71828083]])

where I applied numpy.exp() to results to make them more readable. Obviously, the probabilities are not even close to equal and in fact well classify the training set.

Simple example using BernoulliNB (naive bayes classifier) scikit-learn in python - cannot explain classification

Tags:

python

artificial-intelligence

machine-learning

scikit-learn

MalteseUnderdog

2 Answers

Andreas Mueller

msw

Recent Activity

Donate For Us

Simple example using BernoulliNB (naive bayes classifier) scikit-learn in python - cannot explain classification

Tags:

python

artificial-intelligence

machine-learning

scikit-learn

MalteseUnderdog

2 Answers

Andreas Mueller

msw

Related questions

Recent Activity

Donate For Us