Say I have a data frame in R as follows:
> set.seed(1)
> X <- runif(50, 0, 1)
> Y <- runif(50, 0, 1)
> df <- data.frame(X,Y)
> head(df)
X Y
1 0.2655087 0.47761962
2 0.3721239 0.86120948
3 0.5728534 0.43809711
4 0.9082078 0.24479728
5 0.2016819 0.07067905
6 0.8983897 0.09946616
How do I perform a recursive regression of Y on X, starting at say the first 20 observations and increasing the regression window by one observation at a time until it covers the full sample?
There is a lot of information out there on how to perform a rolling regression of fixed window length (e.g. using rollapply
in the zoo
package). However, my search efforts have come up in vain when it comes to finding a simple recursive option, where the starting point is instead fixed and the window size increases. An exception is the lm.fit.recursive
function from the quantreg
package (here). This works perfectly... but for the fact that it doesn't record any information about standard errors, which I need for a constructing recursive confidence intervals.
I can of course use a loop to achieve this. However, my actual data frame is very large and also grouped by id, which causes complications. So I'm hoping to find a more efficient option. Basically, I'm looking for the R equivalent of the "rolling [...], recursive" command in Stata.
Maybe this will be of help:
set.seed(1)
X1 <- runif(50, 0, 1)
X2 <- runif(50, 0, 10) # I included another variable just for a better demonstration
Y <- runif(50, 0, 1)
df <- data.frame(X1,X2,Y)
rolling_lms <- lapply( seq(20,nrow(df) ), function(x) lm( Y ~ X1+X2, data = df[1:x , ]) )
Using the above lapply
function you the recursive regression you want with full information.
For example for the first regression with 20 observations:
> summary(rolling_lms[[1]])
Call:
lm(formula = Y ~ X1 + X2, data = df[1:x, ])
Residuals:
Min 1Q Median 3Q Max
-0.45975 -0.19158 -0.05259 0.13609 0.67775
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.61082 0.17803 3.431 0.00319 **
X1 -0.37834 0.23151 -1.634 0.12060
X2 0.01949 0.02541 0.767 0.45343
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2876 on 17 degrees of freedom
Multiple R-squared: 0.1527, Adjusted R-squared: 0.05297
F-statistic: 1.531 on 2 and 17 DF, p-value: 0.2446
And has all the info you need.
> length(rolling_lms)
[1] 31
It performed 31 linear regressions starting from 20 observations and until it reached 50. Every regression with all the information is stored as an element of the rolling_lms list.
EDIT
As per Carl's comment below, in order to get a vector of all the slopes for each regression, for X1 variable on this occasion, this is a very good technique (as Carl suggested):
all_slopes<-unlist(sapply(1:31,function(j) rolling_lms[[j]]$coefficients[2]))
Output:
> all_slopes
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.37833614 -0.23231852 -0.20465589 -0.20458938 -0.11796060 -0.14621369 -0.13861210 -0.11906724 -0.10149900 -0.14045509
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.14331323 -0.14450837 -0.16214836 -0.15715630 -0.17388457 -0.11427933 -0.10624746 -0.09767893 -0.10111773 -0.06415914
X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
-0.06432559 -0.04492075 -0.04122131 -0.06138768 -0.06287532 -0.06305953 -0.06491377 -0.01389334 -0.01703270 -0.03683358
X1
-0.02039574
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With