unfortunately I have problems using predict() in the following simple example: <pre class="prettyprint"><code>library(e1071) x <- c(1:10) y <- c(0,0,0,0,1,0,1,1,1,1) test <- c(11:15) mod <- svm(y ~ x, kernel = "linear", gamma = 1, cost = 2, type="C-classification") predict(mod, newdata = test) </code></pre> The result is as follows: <pre class="prettyprint"><code>> predict(mod, newdata = test) 1 2 3 4 <NA> <NA> <NA> <NA> <NA> <NA> 0 0 0 0 0 1 1 1 1 1 </code></pre> Can anybody explain why predict() only gives the fitted values of the training sample (x,y) and does not care about the test-data? Thank you very much for your help! Richard

It looks like this is because you misuse the formula interface to <code>svm()</code>. Normally, one supplies a data frame or similar object within which the variables in the formula are searched for. It usually doesn't matter if you don't do this, even if it is not best practice, but when you want to predict, not putting variables in a data frame gets you in a right mess. The reason it returns the training data is because you don't provide <code>newdata</code> an object with a component named <code>x</code> in it. Hence it can't find the new data <code>x</code> so returns the fitted values. This is common for most R <code>predict</code> methods I know. The solution then is to i) put your training data in a data frame and pass <code>svm</code> this as the <code>data</code> argument, and ii) supply a new data frame containing <code>x</code> (from <code>test</code>) to <code>predict()</code>. E.g.: <pre class="prettyprint"><code>> DF <- data.frame(x = x, y = y) > mod <- svm(y ~ x, data = DF, kernel = "linear", gamma = 1, cost = 2, + type="C-classification") > predict(mod, newdata = data.frame(x = test)) 1 2 3 4 5 1 1 1 1 1 Levels: 0 1 </code></pre>

You need newdata to be of the same form, ie using a data.frame helps: <pre class="prettyprint"><code>R> library(e1071) Loading required package: class R> df <- data.frame(x=1:10, y=sample(c(0,1), 10, rep=TRUE)) R> mod <- svm(y ~ x, kernel = "linear", gamma = 1, + cost = 2, type="C-classification", data=df) R> newdf <- data.frame(x=11:15) R> predict(mod, newdata=newdf) 1 2 3 4 5 0 0 0 0 0 Levels: 0 1 R> </code></pre> By the way, this is also shown the help page for <code>svm()</code>: <pre class="prettyprint"><code> ## density-estimation # create 2-dim. normal with rho=0: X <- data.frame(a = rnorm(1000), b = rnorm(1000)) attach(X) # traditional way: m <- svm(X, gamma = 0.1) # formula interface: m <- svm(~., data = X, gamma = 0.1) # or: m <- svm(~ a + b, gamma = 0.1) # test: newdata <- data.frame(a = c(0, 4), b = c(0, 4)) predict (m, newdata) </code></pre> So in sum, use the formula interface and supply a data.frame --- that is how essentially all modeling functions in R work.

predict.svm does not predict new data

Tags:

r

svm

predict

libsvm

unfortunately I have problems using predict() in the following simple example:

library(e1071)

x <- c(1:10)
y <- c(0,0,0,0,1,0,1,1,1,1)
test <- c(11:15)

mod <- svm(y ~ x, kernel = "linear", gamma = 1, cost = 2, type="C-classification")

predict(mod, newdata = test)

The result is as follows:

> predict(mod, newdata = test)
   1    2    3    4 <NA> <NA> <NA> <NA> <NA> <NA> 
   0    0    0    0    0    1    1    1    1    1

Can anybody explain why predict() only gives the fitted values of the training sample (x,y) and does not care about the test-data?

Thank you very much for your help!

Richard

521

asked Dec 16 '10 15:12

Richard

2 Answers

It looks like this is because you misuse the formula interface to svm(). Normally, one supplies a data frame or similar object within which the variables in the formula are searched for. It usually doesn't matter if you don't do this, even if it is not best practice, but when you want to predict, not putting variables in a data frame gets you in a right mess. The reason it returns the training data is because you don't provide newdata an object with a component named x in it. Hence it can't find the new data x so returns the fitted values. This is common for most R predict methods I know.

The solution then is to i) put your training data in a data frame and pass svm this as the data argument, and ii) supply a new data frame containing x (from test) to predict(). E.g.:

> DF <- data.frame(x = x, y = y)
> mod <- svm(y ~ x, data = DF, kernel = "linear", gamma = 1, cost = 2,
+ type="C-classification")
> predict(mod, newdata = data.frame(x = test))
1 2 3 4 5 
1 1 1 1 1 
Levels: 0 1

answered Oct 18 '22 23:10

Gavin Simpson

You need newdata to be of the same form, ie using a data.frame helps:

R> library(e1071)
Loading required package: class
R> df <- data.frame(x=1:10, y=sample(c(0,1), 10, rep=TRUE))
R> mod <- svm(y ~ x, kernel = "linear", gamma = 1, 
+             cost = 2, type="C-classification", data=df)
R> newdf <- data.frame(x=11:15)
R> predict(mod, newdata=newdf)
1 2 3 4 5
0 0 0 0 0
Levels: 0 1
R>

By the way, this is also shown the help page for svm():

 ## density-estimation

 # create 2-dim. normal with rho=0:
 X <- data.frame(a = rnorm(1000), b = rnorm(1000))
 attach(X)

 # traditional way:
 m <- svm(X, gamma = 0.1)

 # formula interface:
 m <- svm(~., data = X, gamma = 0.1)
 # or:
 m <- svm(~ a + b, gamma = 0.1)

 # test:
 newdata <- data.frame(a = c(0, 4), b = c(0, 4))
 predict (m, newdata)

So in sum, use the formula interface and supply a data.frame --- that is how essentially all modeling functions in R work.

answered Oct 18 '22 23:10

Dirk Eddelbuettel

Related questions
                            
                                Removing latitude and longitude labels in ggplot
                            
                                as.Date produces unexpected result in a sequence of week-based dates
                            
                                Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]
                            
                                `purrr::map` to any type
                            
                                Remove rows with the same value across all columns
                            
                                Remove specific last character from string
                            
                                Error with H2O in R - can't connect to local host
                            
                                How to Transpose (t) in the Tidyverse Using Tidyr
                            
                                R: Remove duplicates from a dataframe based on categories in a column
                            
                                Show content for menuItem when menuSubItems exist in Shiny Dashboard
                            
                                Reducing spacing between lines when using atop
                            
                                How to include NA data in a table
                            
                                Dynamic variable names in R regressions
                            
                                How to recode a range of rows in between two specific values
                            
                                How to trim white spaces when trimws is not working?
                            
                                How to draw a point in polar coordinates with negative r?
                            
                                "Hmisc" package or namespace failed to load - no package called 'latticeExtra'
                            
                                Is it possible to draw the axis line first, before the data?
                            
                                Correlation clustering in R
                            
                                Getting the contents of a library interactively in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

predict.svm does not predict new data

Tags:

r

svm

predict

libsvm

Richard

People also ask

2 Answers

Gavin Simpson

Dirk Eddelbuettel

Recent Activity

Donate For Us