<p>I want to detect the outliers in a "time series data" which contains the trend and seasonality components. I want to leave out the peaks which are seasonal and only consider only the other peaks and label them as outliers. As I am new to time series analysis, Please assist me to approach this time series problem.</p> <p>The coding platform is using is Python.</p> <h3>Attempt 1 : Using ARIMA model</h3> <p>I have trained my model and forecasted for the test data. Then being able to compute the difference between forecasted results with my actual values of test data then able to find out the outliers based on the variance observed.</p> <h3>Implementation of Auto Arima</h3> <pre class="prettyprint"><code>!pip install pyramid-arima from pyramid.arima import auto_arima stepwise_model = auto_arima(train_log, start_p=1, start_q=1,max_p=3, max_q=3,m=7,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore', suppress_warnings=True,stepwise=True) </code></pre> <hr> <pre class="prettyprint"><code>import math import statsmodels.api as sm import statsmodels.tsa.api as smt from sklearn.metrics import mean_squared_error </code></pre> <h3>Split data into train and test-sets</h3> <pre class="prettyprint"><code>train, test = actual_vals[0:-70], actual_vals[-70:] </code></pre> <h3>Log Transformation</h3> <pre class="prettyprint"><code>train_log, test_log = np.log10(train), np.log10(test) </code></pre> <h3>Converting to list</h3> <pre class="prettyprint"><code>history = [x for x in train_log] predictions = list() predict_log=list() </code></pre> <h3>Fitting Stepwise ARIMA model</h3> <pre class="prettyprint"><code>for t in range(len(test_log)): stepwise_model.fit(history) output = stepwise_model.predict(n_periods=1) predict_log.append(output[0]) yhat = 10**output[0] predictions.append(yhat) obs = test_log[t] history.append(obs) </code></pre> <h3>Plotting</h3> <pre class="prettyprint"><code>figsize=(12, 7) plt.figure(figsize=figsize) pyplot.plot(test,label='Actuals') pyplot.plot(predictions, color='red',label='Predicted') pyplot.legend(loc='upper right') pyplot.show() </code></pre> <p>But I can detect the outliers only in test data. Actually, I have to detect the outliers for the whole time series data including the train data I am having.</p> <h3>Attempt 2 : Using Seasonal Decomposition</h3> <p>I have used the below code to split the original data into Seasonal, Trend, Residuals and can be seen in the below image.</p> <pre class="prettyprint"><code>from statsmodels.tsa.seasonal import seasonal_decompose decomposed = seasonal_decompose() </code></pre> <p><img src="https://i.stack.imgur.com/HrjYX.png" alt="enter image description here"></p> <p>Then am using the residual data to find out the outliers using boxplot since the seasonal and trend components were removed. Does this makes sense ?</p> <p>Or is there any other simple or better approach to go with ?</p>

<p>You can:</p> <ul> <li>in the 4th graph (residual plot) at <em><code>"Attempt 2 : Using Seasonal Decomposition"</code></em> try to check for extreme points and that may lead you to some anomalies in the seasonal series.</li> <li>Supervised(if you have some labeled data): Do some classification.</li> <li>Unsupervised: Try to predict the next value and create a confidence interval to check whether the prediction lays inside it or not.</li> <li>You can try to calculate the relative extrema of data. using argrelextrema as shown here for example:</li> </ul> <pre class="prettyprint"><code>from scipy.signal import argrelextrema x = np.array([2, 1, 2, 3, 2, 0, 1, 0]) argrelextrema(x, np.greater) </code></pre> <p>output: </p> <blockquote> <p>(array([3, 6]),)</p> </blockquote> <p>Some random data (My implementation of the above argrelextrema): <img src="https://i.stack.imgur.com/G22Hz.png" alt="enter image description here"></p>

How to detect anomaly in a time series data(specifically) with trend and seasonality present in it?

Tags:

python

machine-learning

time-series

anomaly-detection

I want to detect the outliers in a "time series data" which contains the trend and seasonality components. I want to leave out the peaks which are seasonal and only consider only the other peaks and label them as outliers. As I am new to time series analysis, Please assist me to approach this time series problem.

The coding platform is using is Python.

Attempt 1 : Using ARIMA model

I have trained my model and forecasted for the test data. Then being able to compute the difference between forecasted results with my actual values of test data then able to find out the outliers based on the variance observed.

Implementation of Auto Arima

!pip install pyramid-arima
from pyramid.arima import auto_arima
stepwise_model = auto_arima(train_log, start_p=1, start_q=1,max_p=3, max_q=3,m=7,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore', suppress_warnings=True,stepwise=True)

import math
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from sklearn.metrics import mean_squared_error

Split data into train and test-sets

train, test = actual_vals[0:-70], actual_vals[-70:]

Log Transformation

train_log, test_log = np.log10(train), np.log10(test)

Converting to list

history = [x for x in train_log]
predictions = list()
predict_log=list()

Fitting Stepwise ARIMA model

for t in range(len(test_log)):
stepwise_model.fit(history)
    output = stepwise_model.predict(n_periods=1)
    predict_log.append(output[0])
    yhat = 10**output[0]
    predictions.append(yhat)
    obs = test_log[t]
    history.append(obs)

Plotting

figsize=(12, 7)
plt.figure(figsize=figsize)
pyplot.plot(test,label='Actuals')
pyplot.plot(predictions, color='red',label='Predicted')
pyplot.legend(loc='upper right')
pyplot.show()

But I can detect the outliers only in test data. Actually, I have to detect the outliers for the whole time series data including the train data I am having.

Attempt 2 : Using Seasonal Decomposition

I have used the below code to split the original data into Seasonal, Trend, Residuals and can be seen in the below image.

from statsmodels.tsa.seasonal import seasonal_decompose

decomposed = seasonal_decompose()

enter image description here

Then am using the residual data to find out the outliers using boxplot since the seasonal and trend components were removed. Does this makes sense ?

Or is there any other simple or better approach to go with ?

759

asked Jul 17 '19 06:07

Raja Sahe S

1 Answers

You can:

in the 4th graph (residual plot) at "Attempt 2 : Using Seasonal Decomposition" try to check for extreme points and that may lead you to some anomalies in the seasonal series.
Supervised(if you have some labeled data): Do some classification.
Unsupervised: Try to predict the next value and create a confidence interval to check whether the prediction lays inside it or not.
You can try to calculate the relative extrema of data. using argrelextrema as shown here for example:

from scipy.signal import argrelextrema
x = np.array([2, 1, 2, 3, 2, 0, 1, 0]) 
argrelextrema(x, np.greater)

output:

(array([3, 6]),)

Some random data (My implementation of the above argrelextrema): enter image description here

165

answered Sep 28 '22 00:09

Dor

Related questions
                            
                                How to create nested namespace packages for setuptools distribution
                            
                                AttributeError: type object 'spacy.syntax.nn_parser.array' has no attribute '__reduce_cython__' , (adding Paths to virtual environments)
                            
                                Understanding slots and getting its values in Alexa Skills Kit
                            
                                Google CoLab - How to run a jupyter notebook file that is in the 'Files' tab (i.e. /content/) of my CoLab environment
                            
                                How to downgrade the boto3 version in an AWS Lambda Function
                            
                                Convolving Across Channels in Keras CNN: Conv1D, Depthwise Separable Conv, CCCP?
                            
                                PyQt5 fails as "suitable UI Toolkit" for Mayavi with Python 3.6
                            
                                cumulative logical or within bins
                            
                                python get month of maximum value xarray
                            
                                setup.py ignores full path dependencies, instead looks for "best match" in pypi
                            
                                Retrieve file size for videos stored on Google Photos
                            
                                pylint W0622 (Redefining built-in) when overriding "standard" methods in subclasses
                            
                                Deal with Birtish summer time
                            
                                How to efficiently fill a time series?
                            
                                PyQt5: How to set a custom mouse pointer for each role?
                            
                                Pytorch RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
                            
                                Session keyword arguments are not support during eager execution. You passed: {'learning_rate': 1e-05}
                            
                                How to resolve this import error in Python 3.6? [duplicate]
                            
                                dlib hangs when building on Google Coral dev board
                            
                                How to setup a Jupyter-notebook with calysto-processing to run in Binder?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With