our webapp collects huge amount of data about user actions, network business, database load, etc etc etc
All data is stored in warehouses and we have quite a lot of interesting views on this data.
if something odd happens chances are, it shows up somewhere in the data.
However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities.
My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'.
Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go?
Any pointers would be great!
EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly.
It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off.
R
There are three main classes of anomaly detection techniques: unsupervised, semi-supervised, and supervised. Essentially, the correct anomaly detection method depends on the available labels in the dataset.
To perform and evaluate anomaly detection on time series data, non-traditional performance metrics are needed. In many cases, standard point-based metrics are used when evaluating detections. However, standard point-based metrics cannot accurately evaluate anomaly detection on time series data.
Beyond accuracy, the most commonly used metrics when evaluating anomaly detection solutions are F1, Precision and Recall. One can think about these metrics in the following way: Recall is used to answer the question: What proportion of true anomalies was identified?
When anomalies exist in the data median gives a correct value than the mean because the median sorts the values and finds the middle position in the data whereas the mean just averages the values in the data. To find the outliers in the right and left side of the data you use Q3+1.5(IQR), Q1-1.5(IQR).
Take a look at Control Charts, they provide a way to track changes in your data visually and specify when the data is "out of control" or "anomalous". They are heavily used in manufacturing to ensure quality control.
This question is impossible to answer without knowing much more about the particular data you have. For an overview of what kinds of approaches exist, see Anomaly Detection: A Survey by Chandola, Banerjee, and Kumar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With