Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For loop for forecasting several datasets at once in R

I have a dataset with "Time, Region, Sales" variables and I want to forecast sales for each region using ARIMA or ETS(SES) using library(forecast). There are a total of 70 regions and all of them have 152 observations each and (3 years of data). Something like this:

  Week      Region    Sales 
01/1/2011      A       129
07/1/2011      A       140
14/1/2011      A       133
21/1/2011      A       189
...           ...      ...
01/12/2013     Z       324
07/12/2013     Z       210
14/12/2013     Z       155
21/12/2013     Z       386
28/12/2013     Z       266 

So, I want R to treat every region as a different dataset and perform an auto.arima. I am guessing a for loop should be an ideal fit here but I miserably failed with it. What I would ideally want it to do is a for loop to run something like this (an auto arima for every 152 observations):

fit.A <- auto.arima(data$Sales[1:152])  
fit.B <- auto.arima(data$Sales[153:304])
....
fit.Z <- auto.arima(data$Sales[10490:10640])

I came across this but while converting the dataframe into timeseries, all I got is NAs.

Any help is appreciated! Thank you.

like image 823
Shraddha Avatar asked Jul 23 '14 12:07

Shraddha


2 Answers

Try the very efficient data.table package (assuming your data set called temp)

library(data.table)
library(forecast)
temp <- setDT(temp)[, list(AR = list(auto.arima(Sales))), by = Region]

The last step will save your results in temp in a list formats (as this is the only format you can store this type of an object).

Afterwords you can do any operation you want on these lists, for example, Inspecting them:

temp$AR
#[[1]]
# Series: Sales 
# ARIMA(0,0,0) with non-zero mean 
# 
# Coefficients:
#   intercept
# 147.7500
# s.e.    12.0697
# 
# sigma^2 estimated as 582.7:  log likelihood=-18.41
# AIC=40.82   AICc=52.82   BIC=39.59
#
#[[2]]
# Series: Sales 
# ARIMA(0,0,0) with non-zero mean 
# 
# Coefficients:
#   intercept
# 268.2000
# s.e.    36.4404
# 
# sigma^2 estimated as 6639:  log likelihood=-29.1
# AIC=62.19   AICc=68.19   BIC=61.41

Or plot the forecasts (and etc.)

temp[, sapply(AR, function(x) plot(forecast(x, 10)))]
like image 141
David Arenburg Avatar answered Oct 18 '22 20:10

David Arenburg


You can do this easily with dplyr. Assuming your data frame is named df, run:

library(dplyr)
library(forecast)
model_fits <- group_by(df, Region) %>% do(fit=auto.arima(.$Sales))

The result is a data frame containing the model fits for each region:

> head(model_fits)
Source: local data frame [6 x 2]
Groups: <by row>

  Region        fit
1      A <S3:Arima>
2      B <S3:Arima>
3      C <S3:Arima>
4      D <S3:Arima>
5      E <S3:Arima>
6      F <S3:Arima>

You can get a list with each model fit like so:

> model_fits$fit
[[1]]
Series: .$Sales 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
      intercept
       196.0000
s.e.    14.4486

sigma^2 estimated as 2088:  log likelihood=-52.41
AIC=108.82   AICc=110.53   BIC=109.42

[[2]]
Series: .$Sales 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
      intercept
       179.2000
s.e.    14.3561

sigma^2 estimated as 2061:  log likelihood=-52.34
AIC=108.69   AICc=110.4   BIC=109.29
like image 7
ramhiser Avatar answered Oct 18 '22 19:10

ramhiser