Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculate distance between regression line and datapoint

Tags:

I wonder if there is a way to calculate the distance between a abline in a plot and a datapoint? For example, what is the distance between concentration == 40 with signal == 643 (element 5) and the abline?

concentration <- c(1,10,20,30,40,50)
signal <- c(4, 22, 44, 244, 643, 1102)
plot(concentration, signal)
res <- lm(signal ~ concentration)
abline(res)
like image 248
Lisann Avatar asked Aug 02 '11 10:08

Lisann


People also ask

What is the distance between the actual value and the regression line?

The standard error of estimate (SEE) provides a measure of how accurately the regression equation predicts the Y values. For example, SEE of 2.16 would tell us that the standard or average distance between the actual data points and the regression line is 2.16 units.

What is the distance in the Y direction from a point to the regression line called?

residual. → The value of a residual tells us the vertical distance between a data point and the regression line. The general form for a linear equation is given as: y = a + bx. a represents the y-______.

How do you calculate distance on a grid?

The distance is then √2 × d + s.


2 Answers

You are basically asking for the residuals.

R> residuals(res)
      1       2       3       4       5       6 
 192.61   12.57 -185.48 -205.52  -26.57  212.39 

As an aside, when you fit a linear regression, the sum of the residuals is 0:

R> sum(residuals(res))
[1] 8.882e-15

and if the model is correct, should follow a Normal distribution - qqnorm(res).

I find working with the standardised residuals easier.

> rstandard(res)
       1        2        3        4        5        6 
 1.37707  0.07527 -1.02653 -1.13610 -0.15845  1.54918 

These residuals have been scaled to have mean zero, variance (approximately) equal to one and have a Normal distribution. Outlying standardised residuals are those larger that +/- 2.

like image 110
csgillespie Avatar answered Oct 23 '22 14:10

csgillespie


You can use the function below:

http://paulbourke.net/geometry/pointlineplane/pointline.r

Then just extract the slope and intercept:

> coef(res)
  (Intercept) concentration 
   -210.61098      22.00441

So your final answer would be:

concentration <- c(1,10,20,30,40,50)
signal <- c(4, 22, 44, 244, 643, 1102)
plot(concentration, signal)
res <- lm(signal ~ concentration)
abline(res)

plot

cfs <- coef(res)
distancePointLine(y=signal[5], x=concentration[5], slope=cfs[2], intercept=cfs[1])

If you want a more general solution to finding a particular point, concentration == 40 returns a Boolean vector of length length(concentration). You can use that vector to select points.

pt.sel <- ( concentration == 40 )
> pt.sel
[1] FALSE FALSE FALSE FALSE TRUE FALSE
> distancePointLine(y=signal[pt.sel], x=concentration[pt.sel], slope=cfs["concentration"], intercept=cfs["(Intercept)"])
     1.206032

Unfortunately distancePointLine doesn't appear to be vectorized (or it does, but it returns a warning when you pass it a vector). Otherwise you could get answers for all points just by leaving the [] selector off the x and y arguments.

like image 27
Ari B. Friedman Avatar answered Oct 23 '22 12:10

Ari B. Friedman