Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive regression in R

Say I have a data frame in R as follows:

> set.seed(1)
> X <- runif(50, 0, 1)
> Y <- runif(50, 0, 1)
> df <- data.frame(X,Y)
> head(df)

          X          Y
1 0.2655087 0.47761962
2 0.3721239 0.86120948
3 0.5728534 0.43809711
4 0.9082078 0.24479728
5 0.2016819 0.07067905
6 0.8983897 0.09946616

How do I perform a recursive regression of Y on X, starting at say the first 20 observations and increasing the regression window by one observation at a time until it covers the full sample?

There is a lot of information out there on how to perform a rolling regression of fixed window length (e.g. using rollapply in the zoo package). However, my search efforts have come up in vain when it comes to finding a simple recursive option, where the starting point is instead fixed and the window size increases. An exception is the lm.fit.recursive function from the quantregpackage (here). This works perfectly... but for the fact that it doesn't record any information about standard errors, which I need for a constructing recursive confidence intervals.

I can of course use a loop to achieve this. However, my actual data frame is very large and also grouped by id, which causes complications. So I'm hoping to find a more efficient option. Basically, I'm looking for the R equivalent of the "rolling [...], recursive" command in Stata.

like image 465
Grant Avatar asked Dec 24 '14 09:12

Grant


1 Answers

Maybe this will be of help:

set.seed(1)
X1 <- runif(50, 0, 1)
X2 <- runif(50, 0, 10) # I included another variable just for a better demonstration
Y <- runif(50, 0, 1)
df <- data.frame(X1,X2,Y)


rolling_lms <- lapply( seq(20,nrow(df) ), function(x) lm( Y ~ X1+X2, data = df[1:x , ]) )

Using the above lapply function you the recursive regression you want with full information.

For example for the first regression with 20 observations:

> summary(rolling_lms[[1]])

Call:
lm(formula = Y ~ X1 + X2, data = df[1:x, ])

Residuals:
     Min       1Q   Median       3Q      Max 
-0.45975 -0.19158 -0.05259  0.13609  0.67775 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.61082    0.17803   3.431  0.00319 **
X1          -0.37834    0.23151  -1.634  0.12060   
X2           0.01949    0.02541   0.767  0.45343   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2876 on 17 degrees of freedom
Multiple R-squared:  0.1527,    Adjusted R-squared:  0.05297 
F-statistic: 1.531 on 2 and 17 DF,  p-value: 0.2446

And has all the info you need.

> length(rolling_lms)
[1] 31

It performed 31 linear regressions starting from 20 observations and until it reached 50. Every regression with all the information is stored as an element of the rolling_lms list.

EDIT

As per Carl's comment below, in order to get a vector of all the slopes for each regression, for X1 variable on this occasion, this is a very good technique (as Carl suggested):

all_slopes<-unlist(sapply(1:31,function(j) rolling_lms[[j]]$coefficients[2]))

Output:

> all_slopes
         X1          X1          X1          X1          X1          X1          X1          X1          X1          X1 
-0.37833614 -0.23231852 -0.20465589 -0.20458938 -0.11796060 -0.14621369 -0.13861210 -0.11906724 -0.10149900 -0.14045509 
         X1          X1          X1          X1          X1          X1          X1          X1          X1          X1 
-0.14331323 -0.14450837 -0.16214836 -0.15715630 -0.17388457 -0.11427933 -0.10624746 -0.09767893 -0.10111773 -0.06415914 
         X1          X1          X1          X1          X1          X1          X1          X1          X1          X1 
-0.06432559 -0.04492075 -0.04122131 -0.06138768 -0.06287532 -0.06305953 -0.06491377 -0.01389334 -0.01703270 -0.03683358 
         X1 
-0.02039574 
like image 185
LyzandeR Avatar answered Sep 28 '22 08:09

LyzandeR