Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time series forecasting, dealing with known big orders

Tags:

I have many data sets with known outliers (big orders)

data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1", 155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5, 135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6, 222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6, 231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6, 329429882.8, 264012891.6, 496745973.9, 284484362.55),ncol=2,byrow=FALSE) 

The top 11 outliers of this specific series are:

outliers <- matrix(c("14Q4","14Q2","12Q1","13Q1","14Q2","11Q1","11Q4","14Q2","13Q4","14Q4","13Q1",20193525.68, 18319234.7, 12896323.62, 12718744.01, 12353002.09, 11936190.13, 11356476.28, 11351192.31, 10101527.85, 9723641.25, 9643214.018),ncol=2,byrow=FALSE) 

What methods are there that i can forecast the time series taking these outliers into consideration?

I have already tried replacing the next biggest outlier (so running the data set 10 times replacing the outliers with the next biggest until the 10th data set has all the outliers replaced). I have also tried simply removing the outliers (so again running the data set 10 times removing an outlier each time until all 10 are removed in the 10th data set)

I just want to point out that removing these big orders does not delete the data point completely as there are other deals that happen in that quarter

My code tests the data through multiple forecasting models (ARIMA weighted on the out sample, ARIMA weighted on the in sample, ARIMA weighted, ARIMA, Additive Holt-winters weighted and Multiplcative Holt-winters weighted) so it needs to be something that can be adapted to these multiple models.

Here are a couple more data sets that i used, i do not have the outliers for these series yet though

data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3", 26393.99306, 13820.5037, 23115.82432,    25894.41036,    14926.12574,    15855.8857, 21565.19002,    49373.89675,    27629.10141,    43248.9778, 34231.73851,    83379.26027,    54883.33752,    62863.47728,    47215.92508,    107819.9903,    53239.10602,    71853.5,    59912.7624, 168416.2995,    64565.6211, 94698.38748,    80229.9716, 169205.0023,    70485.55409,    133196.032, 78106.02227), ncol=2,byrow=FALSE)  data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3",3311.5124,    3459.15634, 2721.486863,    3286.51708, 3087.234059,    2873.810071,    2803.969394,    4336.4792,  4722.894582,    4382.349583,    3668.105825,    4410.45429, 4249.507839,    3861.148928,    3842.57616, 5223.671347,    5969.066896,    4814.551389,    3907.677816,    4944.283864,    4750.734617,    4440.221993,    3580.866991,    3942.253996,    3409.597269,    3615.729974,    3174.395507),ncol=2,byrow=FALSE) 

If this is too complicated then an explanation of how, in R, once outliers are detected using certain commands, the data is dealt with to forecast. e.g smoothing etc and how i can approach that writing a code myself (not using the commands that detect outliers)

like image 365
Summer-Jade Gleek'away Avatar asked Apr 13 '15 11:04

Summer-Jade Gleek'away


People also ask

What is the best time series forecasting methods?

AutoRegressive Integrated Moving Average (ARIMA) models are among the most widely used time series forecasting techniques: In an Autoregressive model, the forecasts correspond to a linear combination of past values of the variable.

What are the limitations of time series analysis in forecasting?

Time series analysis also suffers from a number of weaknesses, including problems with generalization from a single study, difficulty in obtaining appropriate measures, and problems with accurately identifying the correct model to represent the data.

What can time series forecasting be used for?

It has tons of practical applications including: weather forecasting, climate forecasting, economic forecasting, healthcare forecasting engineering forecasting, finance forecasting, retail forecasting, business forecasting, environmental studies forecasting, social studies forecasting, and more.

What are the different types of time series models?

The three main types of time series models are moving average, exponential smoothing, and ARIMA. The crucial thing is to choose the right forecasting method as per the characteristics of the time series data.


2 Answers

Your outliers appear to be seasonal variations with the largest orders appearing in the 4-th quarter. Many of the forecasting models you mentioned include the capability for seasonal adjustments. As an example, the simplest model could have a linear dependence on year with corrections for all seasons. Code would look like:

df <- data.frame(period= c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3",                        "10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2",                        "13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1"),                  order= c(155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5,                         135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6,                         222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6,                         231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6,                         329429882.8, 264012891.6, 496745973.9, 42748656.73))  seasonal <- data.frame(year=as.numeric(substr(df$period, 1,2)), qtr=substr(df$period, 3,4), data=df$order) ord_model <- lm(data ~ year + qtr, data=seasonal) seasonal <- cbind(seasonal, fitted=ord_model$fitted) library(reshape2) library(ggplot2) plot_fit <- melt(seasonal,id.vars=c("year", "qtr"), variable.name = "Source", value.name="Order" ) ggplot(plot_fit, aes(x=year, y = Order, colour = qtr, shape=Source)) + geom_point(size=3) 

which gives the results shown in the chart below: Linear fit with seasonal adjustments

Models with a seasonal adjustment but nonlinear dependence upon year may give better fits.

like image 174
WaltS Avatar answered Sep 21 '22 23:09

WaltS


You already said you tried different Arima-models, but as mentioned by WaltS, your series don't seem to contain big outliers, but a seasonal-component, which is nicely captured by auto.arima() in the forecast package:

myTs <- ts(as.numeric(data[,2]), start=c(2008, 1), frequency=4)  myArima <- auto.arima(myTs, lambda=0) myForecast <- forecast(myArima) plot(myForecast) 

enter image description here

where the lambda=0 argument to auto.arima() forces a transformation (or you could take log) of the data by boxcox to take the increasing amplitude of the seasonal-component into account.

like image 20
J.R. Avatar answered Sep 20 '22 23:09

J.R.