unfortunately I have problems using predict() in the following simple example:
library(e1071)
x <- c(1:10)
y <- c(0,0,0,0,1,0,1,1,1,1)
test <- c(11:15)
mod <- svm(y ~ x, kernel = "linear", gamma = 1, cost = 2, type="C-classification")
predict(mod, newdata = test)
The result is as follows:
> predict(mod, newdata = test)
1 2 3 4 <NA> <NA> <NA> <NA> <NA> <NA>
0 0 0 0 0 1 1 1 1 1
Can anybody explain why predict() only gives the fitted values of the training sample (x,y) and does not care about the test-data?
Thank you very much for your help!
Richard
This happens when the columns in test and train data aren't same. Try str (training.data) & str (testing.data) they should have the same variables except for the one that needs to be predicted. Include only those factors you want to use for prediction in the svm training model.
[label,score] = predict (SVMModel,X) also returns a matrix of scores ( score) indicating the likelihood that a label comes from a particular class. For SVM, likelihood measures are either classification scores or class posterior probabilities .
This function predicts values based upon a model trained by svm. # S3 method for svm predict (object, newdata, decision.values = FALSE, probability = FALSE, ..., na.action = na.omit) Object of class "svm", created by svm.
This prediction method requires the trained support vectors and α coefficients (see the SupportVectors and Alpha properties of the SVM model). Perform 10-fold cross-validation.
It looks like this is because you misuse the formula interface to svm()
. Normally, one supplies a data frame or similar object within which the variables in the formula are searched for. It usually doesn't matter if you don't do this, even if it is not best practice, but when you want to predict, not putting variables in a data frame gets you in a right mess. The reason it returns the training data is because you don't provide newdata
an object with a component named x
in it. Hence it can't find the new data x
so returns the fitted values. This is common for most R predict
methods I know.
The solution then is to i) put your training data in a data frame and pass svm
this as the data
argument, and ii) supply a new data frame containing x
(from test
) to predict()
. E.g.:
> DF <- data.frame(x = x, y = y)
> mod <- svm(y ~ x, data = DF, kernel = "linear", gamma = 1, cost = 2,
+ type="C-classification")
> predict(mod, newdata = data.frame(x = test))
1 2 3 4 5
1 1 1 1 1
Levels: 0 1
You need newdata to be of the same form, ie using a data.frame helps:
R> library(e1071)
Loading required package: class
R> df <- data.frame(x=1:10, y=sample(c(0,1), 10, rep=TRUE))
R> mod <- svm(y ~ x, kernel = "linear", gamma = 1,
+ cost = 2, type="C-classification", data=df)
R> newdf <- data.frame(x=11:15)
R> predict(mod, newdata=newdf)
1 2 3 4 5
0 0 0 0 0
Levels: 0 1
R>
By the way, this is also shown the help page for svm()
:
## density-estimation
# create 2-dim. normal with rho=0:
X <- data.frame(a = rnorm(1000), b = rnorm(1000))
attach(X)
# traditional way:
m <- svm(X, gamma = 0.1)
# formula interface:
m <- svm(~., data = X, gamma = 0.1)
# or:
m <- svm(~ a + b, gamma = 0.1)
# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))
predict (m, newdata)
So in sum, use the formula interface and supply a data.frame --- that is how essentially all modeling functions in R work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With