Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple algorithm for online outlier detection of a generic time series

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic).

I would like a simple algorithm for doing an online "outlier detection". Basically, I want to keep in memory (or on disk) the whole historical data for each time series, and I want to detect any outlier in a live scenario (each time a new sample is captured). What is the best way to achieve these results?

I'm currently using a moving average in order to remove some noise, but then what next? Simple things like standard deviation, mad, ... against the whole data set doesn't work well (I can't assume the time series are stationary), and I would like something more "accurate", ideally a black box like:

double outlier_detection(double* vector, double value);

where vector is the array of double containing the historical data, and the return value is the anomaly score for the new sample "value" .

like image 250
Gianluca Avatar asked Aug 02 '10 18:08

Gianluca


Video Answer


1 Answers

This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters, etc, if you're serious about this kind of technique.

like image 152
Paul R Avatar answered Oct 26 '22 02:10

Paul R