Let's say I have following table:
CREATE TABLE `occurences` (
`object_id` int(10) NOT NULL,
`seen_timestamp` int(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
which contains ID of object (not unique, it repeats) and timestamp when this object ID has been observed.
Observation is running 24/7 and inserts every occurrence of object ID with current timestamp.
Now I want to write query to select all object IDs which has been seen during any 10 minute period at least 7 times.
It should function like detection of intrusion.
Similar algorithm is used in denyhost script which checks for invalid SSH logins. If find configured number of occurrences during configured time period, it blocks IP.
Any good suggestion?
SET @num_occurences = 7; -- how many occurences should occur in the interval
SET @max_period = 10; -- your interval in seconds
SELECT offset_start.object_id FROM
(SELECT @rownum_start := @rownum_start+1 AS idx, object_id, seen_timestamp
FROM occurences, (SELECT @rownum_start:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_start
JOIN
(SELECT @rownum_end := @rownum_end + 1 AS idx, object_id, seen_timestamp
FROM occurences, (SELECT @rownum_end:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_end
ON offset_start.object_id = offset_end.object_id
AND offset_start.idx + @num_occurences - 1 = offset_end.idx
AND offset_end.seen_timestamp - offset_start.seen_timestamp <= @max_period
GROUP BY offset_start.object_id;
You can move @num_occurences
and @num_occurences
to your code and set these as parameters of your statement. Depending on your client you can also move the the initialisation of @rownum_start
and @rownum_end
in front of the query, that might improve the query performance (you should test that nontheless, just a gut feeling looking at the explain of both versions)
It selects the entire table twice and joins each row of offset_start
with the row in offset_end
which has an offset of @num_occurences
. (This is done using the @rownum_*
variables to create the index of each row, simulating row_number() functionality known from other rdbms).
Then it just checks whether the two rows refer to the same object_id and satisfy the period requirements.
Since this is done for every occurence row, the object_id would be returned multiple times if the number of occurences is actually larger than @max_occurences
, so it's grouped in the end to make the returned object_id
s unique
You could try
SELECT COUNT(seen_timestamp) AS tot FROM occurences
WHERE seen_timestamp BETWEEN
DATE_ADD(your_dt, INTERVAL -10 MINUTES) AND your_dt
GROUP BY object_id
HAVING tot >= 7
I don't understand why you use int(10)
for seen_timestamp
: you could use a datetime
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With