Although the question seems to have been tackled a lot, I cannot figure out why seasonal decompose doesn't work in my case although I am giving as input a dataframe with a Datetime Index. Here is an example of my dataset:
Customer order actual date Sales Volumes
0 01/01/1900 300
1 10/03/2008 3000
2 15/11/2013 10
3 23/12/2013 200
4 04/03/2014 5
5 17/03/2014 30
6 22/04/2014 1
7 26/06/2014 290
8 30/06/2014 40
the code snippet is shown below:
from statsmodels.tsa.seasonal import seasonal_decompose
df_agg['Customer order actual date'] = pd.to_datetime(df_agg['Customer order actual date'])
df_agg = df_agg.set_index('Customer order actual date')
df_agg.reset_index().sort_values('Customer order actual date', ascending=True)
decomposition = seasonal_decompose(np.asarray(df_agg['Sales Volumes'] ), model = 'multiplicative')
But I get systematically the following error:
: You must specify a freq or x must be a pandas object with a timeseries index witha freq not set to None
Could you please explain why I should give a freq input although I am using a dataframe with Datetime Index? Does it make sense to give a frequency as an input paramater whereas I am looking for the seasonality as an output of seasonal_decompose?
the trend component is calculated as a centered moving average of the original series, the seasonal component is calculated as the per period average of the detrended series, the residual component is obtained after removing the trend and seasonal components from the time series.
MSTL: What is it? MSTL stands for Multiple Seasonal-Trend decomposition using Loess [1]. It is a method to decompose a time series into a trend component, multiple seasonal components, and a residual component.
And in case you want to know what frequency is in seasonal_decompose() - It is the property of your data. So if you collected your data month by month, then it has monthly frequency.
The seasonal_decompose function gets the frequency through inferred_freq. Here is the link - https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DatetimeIndex.html
Inferred_freq on other hand is generated by infer_freq and Infer_freq uses the values of the series and not the index. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.infer_freq.html
This might be a reason why freq needs to be set to a value even with a timeseries index.
And in case you want to know what frequency is in seasonal_decompose() - It is the property of your data. So if you collected your data month by month, then it has monthly frequency.
The method used in seasonal_decompose() to calculate frequency is: _maybe_get_pandas_wrapper_freq().
I did some research on seasonal_decompose() and here are the links which might help you in understanding the function's source code-
source code of seasonal decomposition - https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/seasonal.py
Check out - _maybe_get_pandas_wrapper_freq https://searchcode.com/codesearch/view/86129760/
Hope this helps! Let me know if you find something interesting in addition to it.
Two points on your code snippet.
inplace=True
pd.to_datetime()
function.First of all, if you hand an np.asarray(...) to seasonal_decompose, it will see only an array, your index is gone. So get rid of the np.asarray.
Secondly, if you look at df_agg['Sales Volumes'].index
you will see that freq=None - that's what causes the function to complain. You need an existing frequency like D, M, whatever. You can achieve a frequency by setting it via df_agg.asfreq('D').
Last, but not least: your sample data are not following any frequency - asfreq will fill them up - but you get lots of NaN.
If you want to look up the abbreviations for freqs, they are here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With