Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using experimental uncertainty when performing linear regression in R

Tags:

r

regression

lm

I have an experimental data set which I'd like to fit to a polynomial. The data comprise an independent variable, the dependent variable and an uncertainty in the measurement of the latter, for example

2000  0.2084272   0.002067834
2500  0.207078125 0.001037248
3000  0.2054202   0.001959138
3500  0.203488075 0.000328942
4000  0.2013152   0.000646088
4500  0.198933825 0.001375657
5000  0.196375    0.000908696
5500  0.193668575 0.00014721
6000  0.1908432   0.000526976
6500  0.187926325 0.001217318
7000  0.1849442   0.000556495
7500  0.181921875 0.000401883
8000  0.1788832   0.001446992
8500  0.175850825 0.001235017
9000  0.1728462   0.001676249
9500  0.169889575 0.001011735
10000 0.167       0.000326678

(columns x, y, +-y).

I can carry out a polynomial fit using the above with for example

mydata = read.table("example.txt")
model <- lm(V2~V1+I(V1^2)+I(V1^3)+I(V1^4), data = mydata)

but this does not make use of the uncertainty values. How do I inform R that the third column of the data set is an uncertainty and that it should therefore be used in the regression analysis?

like image 294
Joseph Wright Avatar asked Nov 09 '22 23:11

Joseph Wright


1 Answers

With measurement error in the dependent variable that is uncorrelated with the independent variables, the estimated coefficients are unbiased but the standard errors are too small. Here's the reference I used (page 1 & 2): http://people.stfx.ca/tleo/econ370term2lec4.pdf

I think you just need to adjust the standard errors from those computed by lm(). And that's what I tried to do in the code below. I'm no stat person so you might want to post to cross validated and ask for better intuition.

For the example below, I assumed the "uncertainty" column is the standard deviation (or the standard error). For simplicity, I changed the model to just: y ~ x.

# initialize dataset
df <- data.frame(
        x = c(2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000,8500,9000,9500,10000),
        y = c(0.2084272,0.207078125,0.2054202,0.203488075,0.2013152,0.198933825, 0.196375,0.193668575, 0.1908432, 0.187926325,   0.1849442,   0.181921875, 0.1788832, 0.175850825, 0.1728462,0.169889575,  0.167),
        y.err = c(0.002067834, 0.001037248,  0.001959138, 0.000328942, 0.000646088, 0.001375657, 0.000908696, 0.00014721, 0.000526976, 0.001217318, 0.000556495, 0.000401883, 0.001446992, 0.001235017, 0.001676249, 0.001011735, 0.000326678)
)

df

#  model regression
model <- lm(y~x, data = df)
summary(model)

#  get the variance of the measurement error
#  thanks to: http://schools-wikipedia.org/wp/v/Variance.htm
#  law of total variance
pooled.var.y.err <- mean((df$y.err)^2) + var((df$y.err)^2)

# get variance of beta from model
#   thanks to: http://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression
X <- cbind(1, df$x)
#     (if you add more variables to the model you need to modify the following line)
var.betaHat <- anova(model)[[3]][2] * solve(t(X) %*% X) 

# add betaHat variance to measurement error variance
var.betaHat.adj <- var.betaHat + pooled.var.y.err

# calculate adjusted standard errors 
sqrt(diag(var.betaHat.adj))

# compare to un-adjusted standard errors
sqrt(diag(var.betaHat))
like image 64
Andrew Leidner Avatar answered Nov 15 '22 06:11

Andrew Leidner