I'm new to SVM and e1071. I found that the results are different every time I run the exact same code.
For example:
data(iris)
library(e1071)
model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
pred <- predict(model, iris[150,-5], probability = TRUE)
result1 <- as.data.frame(attr(pred, "probabilities"))
model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
pred <- predict(model, iris[150,-5], probability = TRUE)
result2 <- as.data.frame(attr(pred, "probabilities"))
then I got result1
as:
setosa versicolor virginica
150 0.009704854 0.1903696 0.7999255
and result2
as:
setosa versicolor virginica
150 0.01006306 0.1749947 0.8149423
and the result keeps change every round.
Here I'm using the first 149 rows as a training set and the last row as testing. The probabilities for each classes in result1
and result2
are not exactly the same. I'm guessing there is some process during the prediction that is "random". How is this happening?
I'm aware that the predicted probabilities can be fixed if I set.seed()
with the same number before each call. I'm not "aiming" for a fixed prediction result, but just curious why this happens and what steps it takes to generate the probabilities prediction.
The slight difference doesn't really have a big impact on the iris data, since the last sample would still be predicted as "virginica". But when my data (with two classes A and B) is not that "good", and an unknown sample is predicted to have probability of 0.489 and 0.521 for two times of being class A, it will be confusing.
Thanks!
SVM uses a cross-validation step in developing the estimates of probabilities. The source code for that step starts with:
// Cross-validation decision values for probability estimates
static void svm_binary_svc_probability(
const svm_problem *prob, const svm_parameter *param,
double Cp, double Cn, double& probA, double& probB)
{
int i;
int nr_fold = 5;
int *perm = Malloc(int,prob->l);
double *dec_values = Malloc(double,prob->l);
// random shuffle
GetRNGstate();
for(i=0;i<prob->l;i++) perm[i]=i;
for(i=0;i<prob->l;i++)
{
int j = i+((int) (unif_rand() * (prob->l-i))) % (prob->l-i);
swap(perm[i],perm[j]);
}
You can create "predictability" by setting the random seed just before the call:
> data(iris)
> library(e1071)
> set.seed(123)
> model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
> pred <- predict(model, iris[150,-5], probability = TRUE)
> result1 <- as.data.frame(attr(pred, "probabilities"))
> set.seed(123)
> model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
> pred <- predict(model, iris[150,-5], probability = TRUE)
> result2 <- as.data.frame(attr(pred, "probabilities"))
> result1
setosa versicolor virginica
150 0.009114718 0.1734126 0.8174727
> result2
setosa versicolor virginica
150 0.009114718 0.1734126 0.8174727
But I am reminded of the epigram from Emerson: "A foolish consistency is the hobgoblin of little minds."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With