Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find events that occured x-times during given period

Let's say I have following table:

CREATE TABLE `occurences` (
  `object_id` int(10) NOT NULL,
  `seen_timestamp` int(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

which contains ID of object (not unique, it repeats) and timestamp when this object ID has been observed.

Observation is running 24/7 and inserts every occurrence of object ID with current timestamp.

Now I want to write query to select all object IDs which has been seen during any 10 minute period at least 7 times.

It should function like detection of intrusion.

Similar algorithm is used in denyhost script which checks for invalid SSH logins. If find configured number of occurrences during configured time period, it blocks IP.

Any good suggestion?

like image 864
rkosegi Avatar asked Apr 05 '12 12:04

rkosegi


2 Answers

This should work:

SET @num_occurences = 7; -- how many occurences should occur in the interval
SET @max_period = 10; -- your interval in seconds

SELECT offset_start.object_id FROM 
(SELECT @rownum_start := @rownum_start+1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_start:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_start
JOIN
(SELECT @rownum_end := @rownum_end + 1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_end:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_end
   ON offset_start.object_id = offset_end.object_id 
  AND offset_start.idx + @num_occurences - 1 = offset_end.idx
  AND offset_end.seen_timestamp - offset_start.seen_timestamp <= @max_period
GROUP BY offset_start.object_id;

You can move @num_occurences and @num_occurences to your code and set these as parameters of your statement. Depending on your client you can also move the the initialisation of @rownum_start and @rownum_end in front of the query, that might improve the query performance (you should test that nontheless, just a gut feeling looking at the explain of both versions)

Here's how it works:

It selects the entire table twice and joins each row of offset_start with the row in offset_end which has an offset of @num_occurences. (This is done using the @rownum_* variables to create the index of each row, simulating row_number() functionality known from other rdbms).
Then it just checks whether the two rows refer to the same object_id and satisfy the period requirements.
Since this is done for every occurence row, the object_id would be returned multiple times if the number of occurences is actually larger than @max_occurences, so it's grouped in the end to make the returned object_ids unique

like image 69
ddelbondio Avatar answered Oct 10 '22 00:10

ddelbondio


You could try

SELECT COUNT(seen_timestamp) AS tot FROM occurences
WHERE seen_timestamp BETWEEN
    DATE_ADD(your_dt, INTERVAL -10 MINUTES) AND your_dt
GROUP BY object_id
HAVING tot >= 7

I don't understand why you use int(10) for seen_timestamp: you could use a datetime...

like image 41
Marco Avatar answered Oct 10 '22 00:10

Marco