Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time Series prediction with multiple features in the input data

Assume we have a time-series data that contains the daily orders count of last two years:

We can predict the future's orders using Python's statsmodels library:

fit = statsmodels.api.tsa.statespace.SARIMAX(
                  train.Count, order=(2, 1, 4),seasonal_order=(0,1,1,7)
      ).fit()

y_hat_avg['SARIMA'] = fit1.predict(
                      start="2018-06-16", end="2018-08-14", dynamic=True
                      )

Result (don't mind the numbers):

enter image description here

Now assume that our input data has some unusual increase or decrease, because of holidays or promotions in the company. So we added two columns that tell if each day was a "holiday" and a day that the company has had "promotion".

enter image description here

Is there a method (and a way of implementing it in Python) to use this new type of input data and help the model to understand the reason of outliers, and also predict the future's orders with providing "holiday" and "promotion_day" information?

fit1.predict('2018-08-29', holiday=True, is_promotion=False)
# or
fit1.predict(start="2018-08-20", end="2018-08-25", holiday=[0,0,0,1,1,0], is_promotion=[0,0,1,1,0,1])
like image 431
Saeed Esmaili Avatar asked Aug 15 '18 05:08

Saeed Esmaili


People also ask

What are the input and output features of a time series?

There is no concept of input and output features in time series. Instead, we must choose the variable to be predicted and use feature engineering to construct all of the inputs that will be used to make predictions for future time steps.

How to predict the future value of a time series variable?

In the time series prediction, it is common to use the historical value of the target variable to predict its future value. If the target variable depends on multiple attributes and each attribute forms a time series prediction, how could we make use of these attributes to predict the future value?

What is a univariate time series dataset?

A univariate time series dataset is only comprised of a sequence of observations. These must be transformed into input and output features in order to use supervised learning algorithms.

Why is feature importance important in time series forecasting?

We can use feature importance to help to estimate the relative importance of contrived input features for time series forecasting. This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more.


1 Answers

SARIMAX, as a generalisation of the SARIMA model, is designed to handle exactly this. From the docs,

Parameters:

  • endog (array_like) – The observed time-series process y;
  • exog (array_like, optional) – Array of exogenous regressors, shaped (nobs, k).

You could pass the holiday and promotion_day as an array of size (nobs, 2) to exog, which will inform the model of the exogenous nature of some of these observations.

like image 87
Niels Wouda Avatar answered Oct 04 '22 00:10

Niels Wouda