Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abline not working with Linear regression Model

Tags:

r

I have a data in R so i want to test the data on various models. I have split the data into 2 sets 80% training and 20% testing. So now what i want to do is train the training data set on a linear model and predict it on the testing data set.

I have don this so far.

temp<-lm(formula = cityMpg ~ peakRpm+horsePower+wheelBase , data=train)
temp_test<- predict(temp,test)
plot(temp_test)

Here, I get the scatter plot. Now I just want a line in this scatter plot. When I use abline(temp_test), I get an error. i WANT THE LINE as automatic, I do not wish to specify the co-ordinates. getting error as:

Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) : 
      invalid a=, b= specification
like image 280
Xylo_matic Avatar asked Jan 31 '26 12:01

Xylo_matic


1 Answers

As pointed out above, this is a bit tricky for a multi-dimensional model.

Get some data (you neglected to include a reproducible example: see http://tinyurl.com/reproducible-000 ...)

library(foreign)
dat <- read.arff(url("http://www.cs.umb.edu/~rickb/files/UCI/autos.arff"))

Split into training and test data sets:

train <- dat[1:150,]
test <- dat[151:nrow(dat),]

The variable names are a bit awkward for R (the dashes are interpreted as minus operators, so we have to use back-quotes to protect the names):

fit <- lm(`city-mpg` ~ `peak-rpm`+horsepower+`wheel-base`,data=train)
temp_test <- predict(fit,test)

Plot the predictions vs peak RPM:

par(las=1,bty="l") ## cosmetic
plot(test[["peak-rpm"]],temp_test,xlab="peak rpm",ylab="predicted")

In order to add the line, we have to adjust the intercept according to some baseline values of the other parameters: we'll use the mean (another alternative is to center all the predictor variables before fitting the model):

cf <- coef(fit)
abline(a=cf["(Intercept)"]+
          mean(test$horsepower)*cf["horsepower"]+
          mean(test$`wheel-base`)*cf["`wheel-base`"],
          b=coef(fit)["`peak-rpm`"])

Another way to do this is to use predict():

newdat <- with(test,
            data.frame(horsepower=mean(horsepower),
                       "wheel-base"=mean(`wheel-base`),
                       "peak-rpm"=seq(min(`peak-rpm`),
                                        max(`peak-rpm`),
                                        length=41),
                       check.names=FALSE))
newdat["city-mpg"] <- predict(fit,newdat)
with(newdat,lines(`peak-rpm`,`city-mpg`,col=4))

(41 points is silly for a straight line -- we could have used just 2 -- but will work well if you want to plot something curved, like confidence intervals or a nonlinear fit.)

Alternatively you could just fit the marginal model, but the actual fitted line is somewhat different (it will only be the same if all the predictors are orthogonal to each other):

fit2 <- lm(`city-mpg` ~ `peak-rpm`,data=train)
abline(fit2,col="red")

enter image description here

like image 140
Ben Bolker Avatar answered Feb 02 '26 02:02

Ben Bolker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!