I am using the auto data from R I need to plot the confidence intervals but it is a struggle, this is what I got so far:
I have created the model for linear regression
my_acc<-auto_df$acceleration
my_horse<-auto_df$horsepower
mydata <- data.frame(my_acc, my_horse )
car_linear_regression <- lm(my_acc ~ my_horse, mydata )
I have created the confidence intervals for ONE prediction as the exercise is asking
conf_int<-predict(car_linear_regression,newdata = data.frame(my_horse = 93.5),interval = 'confidence' )
#data.frame(my_horse = 93.5) must be the same as in the original dataframe
pred_int<-predict(car_linear_regression,newdata = data.frame(my_horse = 93.5),interval = 'prediction' )
Then I am trying to plot everthing together but I am totally stuck, I can plot the data with the regression line, but I only get this error
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
plot(my_acc ~ my_horse , data = mydata, pch = 20, cex = 1.5, col="blue", xlab=" car horsepower", ylab = "acceleration secs to 100km/h", main = "Confidence intervals and prediction intervals")
abline(car_linear_regression, lwd = 5, col="red" )
lines(mydata$my_horse, conf_int[,"lwr"], col="red", type="b", pch="+")
For the plot you need definitely predictions on the whole range, i.e. min max of horespower.
data('Auto', package='ISLR')
fo <- acceleration ~ horsepower ## formula object for re-use
fit <- lm(fo, Auto)
We will need a sequence over the range of predictor horsepower, so a glance into summary is helpful.
summary(Auto)
Then we create a sequence for plotting with a reasonable step size. This will be what lines uses to plot the lines.
n_data <- with(Auto, seq(min(horsepower), max(horsepower), by=1))
Now calculate predictions using the sequences,
conf_int <- predict(fit, newdata=list(horsepower=n_data),
interval='confidence', level=.99)
pred_int <- predict(fit, newdata=list(horsepower=n_data),
interval='prediction', level=.99)
and plot the guy.
plot(fo, data=Auto, pch=20, cex=1, col="blue",
xlab=" car horsepower", ylab="acceleration secs to 100km/h",
main="Confidence intervals and prediction intervals", xlim=hp_rg)
abline(fit, lwd=2, col="red")
matlines(n_data, conf_int[, 2:3], lty='dashed', col="red", lwd=2)
matlines(n_data, pred_int[, 2:3], lty='dashed', col="green", lwd=2)
legend('topright', legend=c('conf_int', 'pred_int'), col=c("red", "green"),
lty=2, lwd=2)

Note that I've used matlines here which is more concise, you could also use lines(n_data, conf_int[, 2], ..), lines(n_data, conf_int[, 3], ..) if you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With