Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing Error Rate between two columns R

I have a matrix as below:

Real_Values Predicted_Values
5.5         5.67
6.9         7.01
9.8         9.2
6.5         6.1
10          9.7
1.5         1.0
7.7         7.01

I wish to compute the error rate of my model between the predicted and real values and ideally do a plot. I was wondering if R already has a package that neatly does this, so that I will avoid any for loops?

like image 734
user6336850 Avatar asked Apr 01 '26 08:04

user6336850


1 Answers

You can calculate regression error metrics like the root mean squared error (RMSE) or sum of squared errors (SSE) by hand as pointed out by @nathan-day. Most implementations will automatically do this for you, so you usually don't need to do this by hand.

For the purpose of plotting I'll use a slightly bigger example now, with more samples, as it will be better to understand (the iris dataset shipped with R). First we train a linear model to predict the 4th feature from the first 3 features, which already computes some metrics:

> model <- train(iris[,1:3], iris[,4], method = 'lm', metric = 'RMSE', trControl = trainControl(method = 'repeatedcv', number = 10, repeats = 10))
> print(model)
Linear Regression 

150 samples
3 predictors

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times) 

Summary of sample sizes: 134, 135, 135, 136, 134, 135, ... 

Resampling results

RMSE  Rsquared  RMSE SD  Rsquared SD
0.19  0.942     0.0399   0.0253   

The RMSE, SSE, etc. could now be calculated from the predicted and actual values of the target variable by hand too:

predicted <- predict(model, iris[,1:3]) # perform the prediction 
actual <- iris[,4]
sqrt(mean((predicted-actual)**2)) # RMSE
sum((predicted-actual)**2) # SSE

The slight differences to the results from the model training above results from utilizing a repeated cross-validation (hence the metrics are listed under "resampling results" there).

For the plotting part: regression error can easily be visualized by plotting the predicted against the actual target variable, and/or by plotting the error against the actual value. The perfect fit is represented by the additional line in those plots. This too can easily be achieved with standard tools:

plot(predicted~actual)
abline(0,1)

plot(predicted-actual~actual)
abline(0,0)

PS: if you are not familiar with regression/classification error measure and robust ML procedures I would strongly recommend spending some time to read up upon those topics - it will likely save you lots of lime later. I personally would recommend Applied Predictive Modeling by Max Kuhn (maintainer of the caret package in R) and Kjell Johnson, as it's easy to read and very practical.

like image 181
geekoverdose Avatar answered Apr 02 '26 21:04

geekoverdose



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!