Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forecasting with `midasr` package: Inclusion of new high-frequency value

Tags:

r

time-series

I am trying to calculate one-step-ahead forecasts using the so called MIDAS concept. Within this concept one calculates forecasts in dependence of a higher-frequency data. For example, the dependent variable y could be yearly recorded and be explained with the help of an independent variable x, which could be sampled, for example, quarterly.

There is a package called midasr which offers a lot of functions. I can calculate the one-step-ahead forecasts using the function select_and_forecast from the mentioned package as follows (with simulated data, which is a simplified version of the example form the user's guide to the package midasr):

Generation of the data:

library(midasr)
set.seed(1001)
n <- 250
trend <- c(1:n)
x <- rnorm(4 * n)
z <- rnorm(12 * n)
fn.x <- nealmon(p = c(1, -0.5), d = 8)
y <- 2 + 0.1 * trend + mls(x, 0:7, 4) %*% fn.x + rnorm(n)

Calculation of forecasts (out-of-sample forecast horizon is controlled by the argument outsample, so in my example I am calculating 10 forecasts, from 240 to 250)

select_and_forecast(y~trend+mls(y,1,1,"*")+mls(x,0,4),
                          from=list(x=c(4)),
                          to=list(x=rbind(c(14,19))),
                          insample=1:250,outsample=240:250,
                          weights=list(x=c("nealmon","almonp")),
                          wstart=list(nealmon=rep(1,3),almonp=rep(1,3)),
                          IC="AIC",
                          seltype="restricted",
                          ftype="recursive",
                          measures=c("MSE"),
                          fweights=c("EW","BICW")
)$forecasts[[1]]$forecast

What I would like to do now is to simulate a situation where a new value of the higher-frequency variable becomes available, because, for example, a new month has passed and the value for this month can be used in the model. I would proceed as follows, but am very unsure if it is correct:

select_and_forecast(y~trend+mls(y,1,1,"*")+mls(x,0,4),
                          from=list(x=c(3)),   # The only change is the reduction of the lower bound of the range of the lags of the regeressor from 4 to 3
                          to=list(x=rbind(c(14,19))),
                          insample=1:250,outsample=240:250,
                          weights=list(x=c("nealmon","almonp")),
                          wstart=list(nealmon=rep(1,3),almonp=rep(1,3)),
                          IC="AIC",
                          seltype="restricted",
                          ftype="recursive",
                          measures=c("MSE"),
                          fweights=c("EW","BICW")
)$forecasts[[1]]$forecast

Theoretically one includes the new observations of the higher-frequency variable through reduction of the time index, but I don't know if using the function this way is correct.

This question is for someone who is familiar with the package. Can someone give a comment to this?

The formula I think on is:

y_t=\beta_0 + \beta_1B(L^{1/m};\theta)x_{t-h+1/m}^{(m)} + \epsilon_t^{(m)}

With h=1 in my case and adding 1/m to include a new high-frequency observation

like image 998
DatamineR Avatar asked Jan 14 '14 13:01

DatamineR


1 Answers

I am not sure that I understood your question correctly so I will give an example which I hope will answer your question.

Suppose your response variable y is observed at a yearly frequency and the predictor variable x is observed quarterly (which corresponds to the simulated data). Say you are interested in forecasting next year y value using the data from the previous year. Then the model equation in the pacakge midasr is the following:

y~mls(x,4:7,4)

The values 4:7 are the lags of x used for prediction and 4 indicates that there are 4 observations of x for every observation of y.

The package midasr uses the convention, that for low frequency period t=l we observe high frequency periods m*(l-1)+1:m. So for year 1 we have the quarters 1,2,3,4, for year 2 we have the quarters 5,6,7,8. This convention then assumes that we observe y at year 1 together with the 4 quarter of x, y at year 2 together with quarter 8 of x and etc.

The MIDAS model is formulated in terms of lags, which start at zero. So if we want to explain y at year 1 (as in our example the low frequency is the yearly frequency) with the values of x from the same year, i.e. quarters 4,3,2,1 we use the lags 0,1,2,3. If our goal is to explain y at year 2 with values of x at year 1 the we use lags 4,5,6,7 which correspond to quarters 4,3,2,1.

Now assume the we are at year 3, but we have not observed yet the y value, but we have already observed the first quarter of the year 3, i.e., the quarter 9. Suppose we want to use this information for forecasting. Quarter 9 is three high frequency lags behind the the year 3, hence the model specification is now

y~mls(x,3:7,4)

where we also include all the information about the previous year too.

So if my example corresponds to what you are asking, then yes, inclusion of the new high frequency observation is only a matter of changing value of from argument the way you did. However I strongly suggest to start with one simple model to fully grasp the way the package works.

like image 51
mpiktas Avatar answered Nov 15 '22 03:11

mpiktas