Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Naive Bayes: the within-class variance in each feature of TRAINING must be positive

When trying to fit Naive Bayes:

    training_data = sample; % 
    target_class = K8;
 # train model
 nb = NaiveBayes.fit(training_data, target_class);

 # prediction
 y = nb.predict(cluster3);

I get an error:

??? Error using ==> NaiveBayes.fit>gaussianFit at 535
The within-class variance in each feature of TRAINING
must be positive. The within-class variance in feature
2 5 6 in class normal. are not positive.

Error in ==> NaiveBayes.fit at 498
            obj = gaussianFit(obj, training, gindex);

Can anyone shed light on this and how to solve it? Note that I have read a similar post here but I am not sure what to do? It seems as if its trying to fit based on columns rather than rows, the class variance should be based on the probability of each row belonging to a specific class. If I delete those columns then it works but obviously this isnt what I want to do.

like image 706
G Gr Avatar asked Nov 17 '12 04:11

G Gr


People also ask

What is the assumption of Naive Bayes classifier?

In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter.

How Naive Bayes classification works?

The Naive Bayes classifier works on the principle of conditional probability, as given by the Bayes theorem. While calculating the math on probability, we usually denote probability as P. Some of the probabilities in this event would be as follows: The probability of getting two heads = 1/4.

What does Naive Bayes do?

Naive Bayes utilizes the most fundamental probability knowledge and makes a naive assumption that all features are independent. Despite the simplicity (some may say oversimplification), Naive Bayes gives a decent performance in many applications.

Is Naive Bayes linear decision boundary?

Naive Bayes is a linear classifierNaive Bayes leads to a linear decision boundary in many common cases. Illustrated here is the case where P(xα|y) is Gaussian and where σα,c is identical for all c (but can differ across dimensions α). The boundary of the ellipsoids indicate regions of equal probabilities P(x|y).


1 Answers

Assuming that there is no bug anywhere in your code (or NaiveBayes code from mathworks), and again assuming that your training_data is in the form of NxD where there are N observations and D features, then columns 2, 5, and 6 are completely zero for at least a single class. This can happen if you have relatively small training data and high number of classes, in which a single class may be represented by a few observations. Since NaiveBayes by default treats all features as part of a normal distribution, it cannot work with a column that has zero variance for all features related to a single class. In other words, there is no way for NaiveBayes to find the parameters of the probability distribution by fitting a normal distribution to the features of that specific class (note: the default for distribution is normal).

Take a look at the nature of your features. If they seem to not follow a normal distribution within each class, then normal is not the option you want to use. Maybe your data is closer to a multinomial model mn:

nb = NaiveBayes.fit(training_data, target_class, 'Distribution', 'mn');
like image 122
Bee Avatar answered Sep 23 '22 02:09

Bee