Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time series modelling with irregular data

I'm currently working on a pet project to forecast future base oil prices from historical base oil prices. The data is weekly but there are some periods in between where prices are missing.

I'm somewhat okay with modelling time series with complete data but when it comes to irregular ones, the models that I've learnt may not be applicable. Do I use xts class and proceed with ARIMA models in R the usual way?

After building a model to predict future prices, I'd like to factor in crude oil price fluctuation, diesel profit margin, car sales, economic growth and so on(Multivariable?) to improve accuracy. Can someone shed some light on how do I go about doing this the efficient way? In my mind, it looks like a maze.

EDIT: Trimmed Data here: https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit

Coding:

Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")

Result: Warning message: In log(s2) : NaNs produced

Will this warning affect my model accuracy?

With missing data, I can't use ACF and PACF. Is there a better way to select models? I used AIC(Akaike's Information Criterion) to compare different ARIMA models using this code.ARIMA(3,2,6) gave the smallest AIC.

Coding:

AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC

Result:

              [,1]     [,2]     [,3]     [,4]     [,5]     [,6] 
    [1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063 
    [2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282 
    [3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239 
    [4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150 
    [5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815 
    [6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015
like image 872
leejy Avatar asked Nov 18 '11 09:11

leejy


People also ask

What is irregular time series data?

An irregular time series stores data for a sequence of arbitrary timepoints. Irregular time series are appropriate when the data arrives unpredictably, such as when the application records every stock trade or when electricity meters record random events such as low battery warnings or low voltage indicators.

Does time series data need to be equally spaced?

Time series data have to be ordered in time ... and most methods assume equal spacing. But the interval can be anything you want. 1996-2000-2004 would be fine, so long as everything else is spaced 4 years apart.

What is irregular data?

Published: 10 Jun 2011. Standard, regular data warehouse data is often supplemented by non-standard, irregular data. Such irregular data may be data that has low volatility in the number of rows loaded or is infrequently modified, but nevertheless requires special auditing or management.

What is multivariate time series dataset?

A Multivariate Time Series consist of more than one time-dependent variable and each variable depends not only on its past values but also has some dependency on other variables.


1 Answers

No in general you don't need to use xts and then do an ARIMA, there is an extra step required. Missing values, recorded as NA are handled by arima() and if using method = "ML" then they will be handled exactly; other methods may not get the innovations for missing data. This works because arima() fits the ARIMA model in a state-space representation.

If the data is regular but has missing data then the above should be fine.

The reason I say don't in general use xts is just that arima() requires a univariate time series object ?ts as its input. However, xts extends and inherits from zoo objects and the zoo package does provide an as.ts() method for objects of class "zoo". So if you get your data into a zoo() or xts() object, you can then coerce to class "ts" and that should include the NA in the appropriate places, which arima() will then handle if it can (i.e. there aren't too many missing values).

like image 200
Gavin Simpson Avatar answered Sep 30 '22 09:09

Gavin Simpson