Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Seasonal decompose Freq paramater determination

Although the question seems to have been tackled a lot, I cannot figure out why seasonal decompose doesn't work in my case although I am giving as input a dataframe with a Datetime Index. Here is an example of my dataset:

    Customer order actual date  Sales Volumes
0   01/01/1900                           300
1   10/03/2008                          3000
2   15/11/2013                            10
3   23/12/2013                           200
4   04/03/2014                             5
5   17/03/2014                            30
6   22/04/2014                             1
7   26/06/2014                           290
8   30/06/2014                            40

the code snippet is shown below:

from statsmodels.tsa.seasonal import seasonal_decompose
df_agg['Customer order actual date'] = pd.to_datetime(df_agg['Customer order actual date'])
df_agg = df_agg.set_index('Customer order actual date')
df_agg.reset_index().sort_values('Customer order actual date', ascending=True)
decomposition = seasonal_decompose(np.asarray(df_agg['Sales Volumes'] ), model = 'multiplicative')

But I get systematically the following error:

: You must specify a freq or x must be a pandas object with a timeseries index witha freq not set to None

Could you please explain why I should give a freq input although I am using a dataframe with Datetime Index? Does it make sense to give a frequency as an input paramater whereas I am looking for the seasonality as an output of seasonal_decompose?

like image 356
Galileo Avatar asked May 31 '18 05:05

Galileo


People also ask

How is the trend calculated in seasonal decompose?

the trend component is calculated as a centered moving average of the original series, the seasonal component is calculated as the per period average of the detrended series, the residual component is obtained after removing the trend and seasonal components from the time series.

What is seasonal decompose in Python?

MSTL: What is it? MSTL stands for Multiple Seasonal-Trend decomposition using Loess [1]. It is a method to decompose a time series into a trend component, multiple seasonal components, and a residual component.

What does FREQ mean in seasonal_decompose?

And in case you want to know what frequency is in seasonal_decompose() - It is the property of your data. So if you collected your data month by month, then it has monthly frequency.


3 Answers

The seasonal_decompose function gets the frequency through inferred_freq. Here is the link - https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DatetimeIndex.html

Inferred_freq on other hand is generated by infer_freq and Infer_freq uses the values of the series and not the index. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.infer_freq.html

This might be a reason why freq needs to be set to a value even with a timeseries index.

And in case you want to know what frequency is in seasonal_decompose() - It is the property of your data. So if you collected your data month by month, then it has monthly frequency.

The method used in seasonal_decompose() to calculate frequency is: _maybe_get_pandas_wrapper_freq().

I did some research on seasonal_decompose() and here are the links which might help you in understanding the function's source code-

source code of seasonal decomposition - https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/seasonal.py

Check out - _maybe_get_pandas_wrapper_freq https://searchcode.com/codesearch/view/86129760/

Hope this helps! Let me know if you find something interesting in addition to it.

like image 82
Analyst17 Avatar answered Oct 27 '22 00:10

Analyst17


Two points on your code snippet.

  1. On line 4 of your code you are reseting the index, but you are not assigning it to a value, if you want to do it in place, you should add inplace=True
  2. seasonal decompose works on timeseries, so your data needs to have a date time index. (you can do it either while loading the csv, or you can use pd.to_datetime() function.
like image 31
yosemite_k Avatar answered Oct 27 '22 00:10

yosemite_k


First of all, if you hand an np.asarray(...) to seasonal_decompose, it will see only an array, your index is gone. So get rid of the np.asarray.

Secondly, if you look at df_agg['Sales Volumes'].index you will see that freq=None - that's what causes the function to complain. You need an existing frequency like D, M, whatever. You can achieve a frequency by setting it via df_agg.asfreq('D').

Last, but not least: your sample data are not following any frequency - asfreq will fill them up - but you get lots of NaN.

If you want to look up the abbreviations for freqs, they are here.

like image 23
Rriskit Avatar answered Oct 26 '22 23:10

Rriskit