Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect Anomaly Intervals with SQL

My problem is simple: I have a table with a series of statuses and timestamps (for the sake of curiosity, these statuses indicate alarm levels) and I would like to query this table in order to get duration between two statuses.

Seems simple, but here comes the tricky part: I can´t create look-up tables, procedures and it should be as fast as possible as this table is a little monster holding over 1 billion records (no kidding!)...

The schema is drop dead simple:

[pk] Time Value

(actualy, there is a second pk but it is useless for this)

And below a real world example:

Timestamp          Status
2013-1-1 00:00:00    1
2013-1-1 00:00:05    2
2013-1-1 00:00:10    2
2013-1-1 00:00:15    2
2013-1-1 00:00:20    0
2013-1-1 00:00:25    1
2013-1-1 00:00:30    2
2013-1-1 00:00:35    2
2013-1-1 00:00:40    0

The output, considering only a level 2 alarm, should be as follow should report the begin of a level 2 alarm an its end (when reach 0):

StartTime          EndTime            Interval
2013-1-1 00:00:05  2013-1-1 00:00:20     15
2013-1-1 00:00:30  2013-1-1 00:00:40     10

I have been trying all sorts of inner joins, but all of them lead me to an amazing Cartesian explosion. Can you guys help me figure out a way to accomplish this?

Thanks!

like image 571
Rafa Borges Avatar asked Jan 24 '13 19:01

Rafa Borges


People also ask

How do you detect anomalies in a data set?

How to detect Anomalies? Simple statistical techniques such as mean, median, quantiles can be used to detect univariate anomalies feature values in the dataset. Various data visualization and exploratory data analysis techniques can be also be used to detect anomalies.

What are the three 3 basic approaches to anomaly detection?

However, these types of micro clusters can often be identified more readily by a cluster analysis algorithm. There are three main classes of anomaly detection techniques: unsupervised, semi-supervised, and supervised. Essentially, the correct anomaly detection method depends on the available labels in the dataset.

What is anomaly in SQL Server?

An update anomaly is a data inconsistency that results from data redundancy and a partial update. For example, each employee in a company has a department associated with them as well as the student group they participate in. Employee_ID.


1 Answers

This has to be one of the harder questions I've seen today - thanks! I assume you can use CTEs? If so, try something like this:

;WITH Filtered
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY dateField) RN, dateField, Status
    FROM Test    
)
SELECT F1.RN, F3.MinRN,
    F1.dateField StartDate,
    F2.dateField Enddate
FROM Filtered      F1, Filtered F2, (
SELECT F1a.RN, MIN(F3a.RN) as MinRN
FROM Filtered      F1a
   JOIN Filtered F2a ON F1a.RN = F2a.RN+1 AND F1a.Status = 2 AND F2a.Status <> 2
   JOIN Filtered F3a ON F1a.RN < F3a.RN AND F3a.Status <> 2
GROUP BY F1a.RN ) F3 
WHERE F1.RN = F3.RN AND F2.RN = F3.MinRN

And the Fiddle. I didn't add the intervals, but I imagine you can handle that part from here.

Good luck.

like image 88
sgeddes Avatar answered Oct 03 '22 02:10

sgeddes