I have a DataFrame
with a few time series:
divida movav12 var varmovav12
Date
2004-01 0 NaN NaN NaN
2004-02 0 NaN NaN NaN
2004-03 0 NaN NaN NaN
2004-04 34 NaN inf NaN
2004-05 30 NaN -0.117647 NaN
2004-06 44 NaN 0.466667 NaN
2004-07 35 NaN -0.204545 NaN
2004-08 31 NaN -0.114286 NaN
2004-09 30 NaN -0.032258 NaN
2004-10 24 NaN -0.200000 NaN
2004-11 41 NaN 0.708333 NaN
2004-12 29 24.833333 -0.292683 NaN
2005-01 31 27.416667 0.068966 0.104027
2005-02 28 29.750000 -0.096774 0.085106
2005-03 27 32.000000 -0.035714 0.075630
2005-04 30 31.666667 0.111111 -0.010417
2005-05 31 31.750000 0.033333 0.002632
2005-06 39 31.333333 0.258065 -0.013123
2005-07 36 31.416667 -0.076923 0.002660
I want to decompose the first time series divida
in a way that I can separate its trend from its seasonal and residual components.
I found an answer here, and am trying to use the following code:
import statsmodels.api as sm
s=sm.tsa.seasonal_decompose(divida.divida)
However I keep getting this error:
Traceback (most recent call last):
File "/Users/Pred_UnBR_Mod2.py", line 78, in <module> s=sm.tsa.seasonal_decompose(divida.divida)
File "/Library/Python/2.7/site-packages/statsmodels/tsa/seasonal.py", line 58, in seasonal_decompose _pandas_wrapper, pfreq = _maybe_get_pandas_wrapper_freq(x)
File "/Library/Python/2.7/site-packages/statsmodels/tsa/filters/_utils.py", line 46, in _maybe_get_pandas_wrapper_freq
freq = index.inferred_freq
AttributeError: 'Index' object has no attribute 'inferred_freq'
How can I proceed?
For example, a seasonal decomposition of time series by Loess (STL) plot decomposes a time series into seasonal, trend and irregular components using loess and plots the components separately, whereby the cyclical component (if present in the data) is included in the "trend" component plot.
the trend component is calculated as a centered moving average of the original series, the seasonal component is calculated as the per period average of the detrended series, the residual component is obtained after removing the trend and seasonal components from the time series.
Trend, as its name suggests, is the overall direction of the data. Seasonality is a periodic component. And the residual is what's left over when the trend and seasonality have been removed. Residuals are random fluctuations. You can think of them as a noise component.
Simply put, time series decomposition is a process of deconstructing a time series into the following components: Seasonal — behaviors captured in individual seasonal periods
The trend and seasonality components are optional. In time series data, these components are either additively or multiplicatively combined. Additive Model are the one where the variance of data doesn’t change over different values of the time series. The systematic component is the arithmetic sum of the individual effects of the predictors.
Here “STL” stands for “Seasonal and Trend decomposition using Loess”. It is more robust and versatile method used for decomposition. Loess is a method for estimating Non-Linear relationship. Seasonal component allowed to change over time, and rate of change controlled by user. Smoothness of trend-cycle also controlled by user.
If the seasonal and noise components change the trend by an amount that is independent of the value of trend, the trend, seasonal and noise components are said to behave in an additive way. One can represent this situation as follows: where y_i = the value of the time series at the ith time step.
Works fine when you convert your index
to DateTimeIndex
:
df.reset_index(inplace=True)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
s=sm.tsa.seasonal_decompose(df.divida)
<statsmodels.tsa.seasonal.DecomposeResult object at 0x110ec3710>
Access the components via:
s.resid
s.seasonal
s.trend
Statsmodel will decompose the series only if you provide frequency. Usually all time series index will contain frequency eg: Daywise, Business days, weekly So it shows error. You can remove this error by two ways:
DateTime
function. It uses internal function infer_freq
to find the frequency and return the index with frequency.df.index.asfreq(freq='m')
. Here m
represents month. You can set the frequency if you have domain knowledge or by d
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With