Why does naiveBayes return all NA's for multiclass classification in R?

Tags:

r

Started to write this question, and then figured out the answer. Going to put it here for posterity, since it was hard to find answers on this.

I'm trying to use the naiveBayes classifier from the e1071 package. It seems to have no trouble generating predictions for new data, but I actually need the probability estimates for the classes of the new data.

Example:

> model <- naiveBayes(formula=as.factor(V11)~., data=table, laplace=3)
> predict(model, table[,1:10]) 
[1] 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1
> predict(model, table[,1:10], type="raw")
       1  2  3  4
 [1,] NA NA NA NA
 [2,] NA NA NA NA
 [3,] NA NA NA NA
 [4,] NA NA NA NA
 [5,] NA NA NA NA
 [6,] NA NA NA NA
 [7,] NA NA NA NA
 [8,] NA NA NA NA
 [9,] NA NA NA NA
[10,] NA NA NA NA
[11,] NA NA NA NA
[12,] NA NA NA NA
[13,] NA NA NA NA
[14,] NA NA NA NA
[15,] NA NA NA NA

This seems absurd to me, since the fact that the model is able to output predictions means it must have probability estimates for the classes. What is causing this strange behaviour?

Some things I've already tried without success:

adding type="raw" to the model construction call.
Using the NaiveBayes function from the klaR package instead (which cannot handle the .

An example of some data which produces this error:

table[1:5,]
  V1 V2       V3         V4        V5        V6        V7        V8        V9
1  0  0 0.000000  0.0000000  0.000000 0.0000000 0.6711444 0.7110409 0.0000000
2  0  0 0.000000  0.0000000 -1.345804 2.1978370 0.6711444 0.7110409 0.0000000
3  0  0 1.923538 -3.6718725  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
4  0  0 1.923538 -0.4079858  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
5  0  0 0.000000  0.0000000 -1.345804 0.2930449 0.6711444 0.7110409 0.0000000
         V10 V11
1  0.0000000   6
2  0.0000000   3
3 -3.1316213   2
4 -0.2170431   5
5  0.0000000   4

390

asked Jul 28 '13 01:07

John Doucette

1 Answers

This is happening because one of the classes in the dataset has only one instance.

An easy fix for my application was to clone that record and add a tiny amount of noise, after which predict works as expected.

Edit: it actually seems the addition of noise is not always required. Here's a really simple example that resolves the dataset posted in the question, by simply adding an extra copy of every row in the table:

> table <- as.data.frame(rbind(as.matrix(table),as.matrix(table))
> nms <- colnames(table)
> model <- naiveBayes(table[,1:length(nms)-1], factor(table[,length(nms)]))
> predict(model, table[,1:(length(nms)-1)], type='raw')
                 2            3            4            5            6
 [1,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [2,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [3,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [4,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
 [5,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
 [6,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [7,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [8,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [9,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[10,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12

123

answered Oct 27 '22 02:10

John Doucette

Related questions
                            
                                K-means with really large matrix
                            
                                Can R produce on-the-fly graphs for website?
                            
                                How do I use elements of a dataframe like hash keys / dictionary keys / primary keys?
                            
                                Import date-time at a specified timezone, disregard Daylight Savings Time
                            
                                What exactly does R CMD Sweave --pdf do?
                            
                                How to pass a list to ggplot2?
                            
                                Is there any existing syntax checker for GNU R
                            
                                What is the difference between sort() and sort.list() in R?
                            
                                aggregate/sum with ggplot
                            
                                How to predict x values from a linear model (lm)
                            
                                How to specify in which order to load S4 methods when using roxygen2
                            
                                how to create an R data frame from a xml file
                            
                                ggplot font family change between versions
                            
                                unexpected output from aggregate
                            
                                Replace NAs by simulating data
                            
                                multiple choice box in R/shiny - adding a scroll bar
                            
                                R package compilation with dependency on data.table
                            
                                Error running ImageMagick from R: Invalid parameter
                            
                                My R has memory leaks?
                            
                                Convex hull ggplot using data.tables in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does naiveBayes return all NA's for multiclass classification in R?

Tags:

r

John Doucette

People also ask

1 Answers

John Doucette

Recent Activity

Donate For Us