Started to write this question, and then figured out the answer. Going to put it here for posterity, since it was hard to find answers on this.
I'm trying to use the naiveBayes classifier from the e1071 package. It seems to have no trouble generating predictions for new data, but I actually need the probability estimates for the classes of the new data.
Example:
> model <- naiveBayes(formula=as.factor(V11)~., data=table, laplace=3)
> predict(model, table[,1:10])
[1] 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1
> predict(model, table[,1:10], type="raw")
1 2 3 4
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
[9,] NA NA NA NA
[10,] NA NA NA NA
[11,] NA NA NA NA
[12,] NA NA NA NA
[13,] NA NA NA NA
[14,] NA NA NA NA
[15,] NA NA NA NA
This seems absurd to me, since the fact that the model is able to output predictions means it must have probability estimates for the classes. What is causing this strange behaviour?
Some things I've already tried without success:
An example of some data which produces this error:
table[1:5,]
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 0 0 0.000000 0.0000000 0.000000 0.0000000 0.6711444 0.7110409 0.0000000
2 0 0 0.000000 0.0000000 -1.345804 2.1978370 0.6711444 0.7110409 0.0000000
3 0 0 1.923538 -3.6718725 0.000000 0.0000000 0.0000000 0.0000000 0.8980172
4 0 0 1.923538 -0.4079858 0.000000 0.0000000 0.0000000 0.0000000 0.8980172
5 0 0 0.000000 0.0000000 -1.345804 0.2930449 0.6711444 0.7110409 0.0000000
V10 V11
1 0.0000000 6
2 0.0000000 3
3 -3.1316213 2
4 -0.2170431 5
5 0.0000000 4
Summary: The e1071 package contains the naiveBayes function. It allows numeric and factor variables to be used in the naive bayes model. Laplace smoothing allows unrepresented classes to show up.
Naive Bayes classifier calculates the probability of an event in the following steps: Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
This is happening because one of the classes in the dataset has only one instance.
An easy fix for my application was to clone that record and add a tiny amount of noise, after which predict works as expected.
Edit: it actually seems the addition of noise is not always required. Here's a really simple example that resolves the dataset posted in the question, by simply adding an extra copy of every row in the table:
> table <- as.data.frame(rbind(as.matrix(table),as.matrix(table))
> nms <- colnames(table)
> model <- naiveBayes(table[,1:length(nms)-1], factor(table[,length(nms)]))
> predict(model, table[,1:(length(nms)-1)], type='raw')
2 3 4 5 6
[1,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
[2,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
[3,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
[4,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[5,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
[6,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
[7,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
[8,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
[9,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[10,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With