Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does naiveBayes return all NA's for multiclass classification in R?

Tags:

r

Started to write this question, and then figured out the answer. Going to put it here for posterity, since it was hard to find answers on this.

I'm trying to use the naiveBayes classifier from the e1071 package. It seems to have no trouble generating predictions for new data, but I actually need the probability estimates for the classes of the new data.

Example:

> model <- naiveBayes(formula=as.factor(V11)~., data=table, laplace=3)
> predict(model, table[,1:10]) 
[1] 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1
> predict(model, table[,1:10], type="raw")
       1  2  3  4
 [1,] NA NA NA NA
 [2,] NA NA NA NA
 [3,] NA NA NA NA
 [4,] NA NA NA NA
 [5,] NA NA NA NA
 [6,] NA NA NA NA
 [7,] NA NA NA NA
 [8,] NA NA NA NA
 [9,] NA NA NA NA
[10,] NA NA NA NA
[11,] NA NA NA NA
[12,] NA NA NA NA
[13,] NA NA NA NA
[14,] NA NA NA NA
[15,] NA NA NA NA

This seems absurd to me, since the fact that the model is able to output predictions means it must have probability estimates for the classes. What is causing this strange behaviour?

Some things I've already tried without success:

  • adding type="raw" to the model construction call.
  • Using the NaiveBayes function from the klaR package instead (which cannot handle the .

An example of some data which produces this error:

table[1:5,]
  V1 V2       V3         V4        V5        V6        V7        V8        V9
1  0  0 0.000000  0.0000000  0.000000 0.0000000 0.6711444 0.7110409 0.0000000
2  0  0 0.000000  0.0000000 -1.345804 2.1978370 0.6711444 0.7110409 0.0000000
3  0  0 1.923538 -3.6718725  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
4  0  0 1.923538 -0.4079858  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
5  0  0 0.000000  0.0000000 -1.345804 0.2930449 0.6711444 0.7110409 0.0000000
         V10 V11
1  0.0000000   6
2  0.0000000   3
3 -3.1316213   2
4 -0.2170431   5
5  0.0000000   4
like image 390
John Doucette Avatar asked Jul 28 '13 01:07

John Doucette


People also ask

What package is naive Bayes in?

Summary: The e1071 package contains the naiveBayes function. It allows numeric and factor variables to be used in the naive bayes model. Laplace smoothing allows unrepresented classes to show up.

How do you find the accuracy of a naive Bayes classifier?

Naive Bayes classifier calculates the probability of an event in the following steps: Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability.

What does naive Bayes classifier do?

Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.


1 Answers

This is happening because one of the classes in the dataset has only one instance.

An easy fix for my application was to clone that record and add a tiny amount of noise, after which predict works as expected.

Edit: it actually seems the addition of noise is not always required. Here's a really simple example that resolves the dataset posted in the question, by simply adding an extra copy of every row in the table:

> table <- as.data.frame(rbind(as.matrix(table),as.matrix(table))
> nms <- colnames(table)
> model <- naiveBayes(table[,1:length(nms)-1], factor(table[,length(nms)]))
> predict(model, table[,1:(length(nms)-1)], type='raw')
                 2            3            4            5            6
 [1,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [2,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [3,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [4,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
 [5,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
 [6,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [7,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [8,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [9,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[10,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
like image 123
John Doucette Avatar answered Oct 27 '22 02:10

John Doucette