I'm using lm on a time series, which works quite well actually, and it's super super fast. Let's say my model is: <pre class="prettyprint"><code>> formula <- y ~ x </code></pre> I train this on a training set: <pre class="prettyprint"><code>> train <- data.frame( x = seq(1,3), y = c(2,1,4) ) > model <- lm( formula, train ) </code></pre> ... and I can make predictions for new data: <pre class="prettyprint"><code>> test <- data.frame( x = seq(4,6) ) > test$y <- predict( model, newdata = test ) > test x y 1 4 4.333333 2 5 5.333333 3 6 6.333333 </code></pre> This works super nicely, and it's really speedy. I want to add lagged variables to the model. Now, I could do this by augmenting my original training set: <pre class="prettyprint"><code>> train$y_1 <- c(0,train$y[1:nrow(train)-1]) > train x y y_1 1 1 2 0 2 2 1 2 3 3 4 1 </code></pre> update the formula: <pre class="prettyprint"><code>formula <- y ~ x * y_1 </code></pre> ... and training will work just fine: <pre class="prettyprint"><code>> model <- lm( formula, train ) > # no errors here </code></pre> However, the problem is that there is no way of using 'predict', because there is no way of populating y_1 in a test set in a batch manner. Now, for lots of other regression things, there are very convenient ways to express them in the formula, such as <code>poly(x,2)</code> and so on, and these work directly using the unmodified training and test data. So, I'm wondering if there is some way of expressing lagged variables in the formula, so that <code>predict</code> can be used? Ideally: <pre class="prettyprint"><code>formula <- y ~ x * lag(y,-1) model <- lm( formula, train ) test$y <- predict( model, newdata = test ) </code></pre> ... without having to augment (not sure if that's the right word) the training and test datasets, and just being able to use <code>predict</code> directly?

Have a look at e.g. the dynlm package which gives you lag operators. More generally the Task Views on Econometrics and Time Series will have lots more for you to look at. Here is the beginning of its examples -- a one and twelve month lag: <pre class="prettyprint"><code>R> data("UKDriverDeaths", package = "datasets") R> uk <- log10(UKDriverDeaths) R> dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12)) R> dfm Time series regression with "ts" data: Start = 1970(1), End = 1984(12) Call: dynlm(formula = uk ~ L(uk, 1) + L(uk, 12)) Coefficients: (Intercept) L(uk, 1) L(uk, 12) 0.183 0.431 0.511 R> </code></pre>

Adding lagged variables to an lm model?

Tags:

I'm using lm on a time series, which works quite well actually, and it's super super fast.

Let's say my model is:

> formula <- y ~ x

I train this on a training set:

> train <- data.frame( x = seq(1,3), y = c(2,1,4) ) > model <- lm( formula, train )

... and I can make predictions for new data:

> test <- data.frame( x = seq(4,6) ) > test$y <- predict( model, newdata = test ) > test   x        y 1 4 4.333333 2 5 5.333333 3 6 6.333333

This works super nicely, and it's really speedy.

I want to add lagged variables to the model. Now, I could do this by augmenting my original training set:

> train$y_1 <- c(0,train$y[1:nrow(train)-1]) > train   x y y_1 1 1 2   0 2 2 1   2 3 3 4   1

update the formula:

formula <- y ~ x * y_1

... and training will work just fine:

> model <- lm( formula, train ) > # no errors here

However, the problem is that there is no way of using 'predict', because there is no way of populating y_1 in a test set in a batch manner.

Now, for lots of other regression things, there are very convenient ways to express them in the formula, such as poly(x,2) and so on, and these work directly using the unmodified training and test data.

So, I'm wondering if there is some way of expressing lagged variables in the formula, so that predict can be used? Ideally:

formula <- y ~ x * lag(y,-1) model <- lm( formula, train ) test$y <- predict( model, newdata = test )

... without having to augment (not sure if that's the right word) the training and test datasets, and just being able to use predict directly?

914

asked Oct 27 '12 02:10

Hugh Perkins

2 Answers

Have a look at e.g. the dynlm package which gives you lag operators. More generally the Task Views on Econometrics and Time Series will have lots more for you to look at.

Here is the beginning of its examples -- a one and twelve month lag:

R>      data("UKDriverDeaths", package = "datasets") R>      uk <- log10(UKDriverDeaths) R>      dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12)) R>      dfm  Time series regression with "ts" data: Start = 1970(1), End = 1984(12)  Call: dynlm(formula = uk ~ L(uk, 1) + L(uk, 12))  Coefficients: (Intercept)     L(uk, 1)    L(uk, 12)         0.183        0.431        0.511    R>

answered Sep 19 '22 15:09

Dirk Eddelbuettel

Following Dirk's suggestion on dynlm, I couldn't quite figure out how to predict, but searching for that led me to dyn package via https://stats.stackexchange.com/questions/6758/1-step-ahead-predictions-with-dynlm-r-package

Then after several hours of experimentation I came up with the following function to handle the prediction. There were quite a few 'gotcha's on the way, eg you can't seem to rbind time series, and the result of predict is offset by start and a whole bunch of things like that, so I feel this answer adds significantly compared to just naming a package, though I have upvoted Dirk's answer.

So, a solution that works is:

use the dyn package
use the following method for prediction

predictDyn method:

# pass in training data, test data, # it will step through one by one # need to give dependent var name, so that it can make this into a timeseries predictDyn <- function( model, train, test, dependentvarname ) {     Ntrain <- nrow(train)     Ntest <- nrow(test)     # can't rbind ts's apparently, so convert to numeric first     train[,dependentvarname] <- as.numeric(train[,dependentvarname])     test[,dependentvarname] <- as.numeric(test[,dependentvarname])     testtraindata <- rbind( train, test )     testtraindata[,dependentvarname] <- ts( as.numeric( testtraindata[,dependentvarname] ) )     for( i in 1:Ntest ) {        result <- predict(model,newdata=testtraindata,subset=1:(Ntrain+i-1))        testtraindata[Ntrain+i,dependentvarname] <- result[Ntrain + i + 1 - start(result)][1]     }     return( testtraindata[(Ntrain+1):(Ntrain + Ntest),] ) }

Example usage:

library("dyn")  # size of training and test data N <- 6 predictN <- 10  # create training data, which we can get exact fit on, so we can check the results easily traindata <- c(1,2) for( i in 3:N ) { traindata[i] <- 0.5 + 1.3 * traindata[i-2] + 1.7 * traindata[i-1] } train <- data.frame( y = ts( traindata ), foo = 1)  # create testing data, bunch of NAs test <- data.frame( y = ts( rep(NA,predictN) ), foo = 1)  # fit a model model <- dyn$lm( y ~ lag(y,-1) + lag(y,-2), train ) # look at the model, it's a perfect fit. Nice! print(model)  test <- predictDyn( model, train, test, "y" ) print(test)  # nice plot plot(test$y, type='l')

Output:

> model  Call: lm(formula = dyn(y ~ lag(y, -1) + lag(y, -2)), data = train)  Coefficients: (Intercept)   lag(y, -1)   lag(y, -2)           0.5          1.7          1.3    > test              y foo 7     143.2054   1 8     325.6810   1 9     740.3247   1 10   1682.4373   1 11   3823.0656   1 12   8686.8801   1 13  19738.1816   1 14  44848.3528   1 15 101902.3358   1 16 231537.3296   1

Edit: hmmm, this is super slow though. Even if I limit the data in the subset to a constant few rows of the dataset, it takes about 24 milliseconds per prediction, or, for my task, 0.024*7*24*8*20*10/60/60 = 1.792 hours :-O

answered Sep 20 '22 15:09

Hugh Perkins

Related questions
                            
                                how i can break things with Fragments with setRetainInstance(true) and adding them to backstack?
                            
                                Eclipse C++ formatter puts new line before method identifiers
                            
                                Drawing colored text in UIView's -drawRect: method
                            
                                Make longer subplot tick marks in matplotlib?
                            
                                Task.Factory.StartNew vs Async methods
                            
                                Executing a PHP script with a CRON Job [closed]
                            
                                How to trigger download with Rails send_data from AJAX post
                            
                                SQL Azure and Full-Text Catalogs, Indexes and Searching
                            
                                make_shared and emplace functions
                            
                                HTML5 File API with filename path
                            
                                Is coroutine a new thread in Unity3D?
                            
                                IF Condition Perform Query, Else Perform Other Query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Adding lagged variables to an lm model?

Tags:

Hugh Perkins

People also ask

2 Answers

Dirk Eddelbuettel

Hugh Perkins

Recent Activity

Donate For Us