I have an event table that has the following fields:
event_id
event_type
event_time
Given a duration D
and a number k
, I need a count of all the event_type
's that had more than K
events in any relative time window of duration D
. This basically requires a sliding window with respect to each event. For example, I want all the event_type's that had activity of more than 5 events in any 10 minute duration.
I am not sure how to work around this without window functions.
(I am on mysql 5.6. I am talking about a dataset of under 1 million rows.)
MySQL supports window functions that, for each row from a query, perform a calculation using rows related to that row. The following sections discuss how to use window functions, including descriptions of the OVER and WINDOW clauses.
MySQL Usage Aurora MySQL version 5.7 doesn't support Window functions.
It means window functions perform operations on a set of rows and produces an aggregated value for each row. Therefore each row maintains the unique identities. The window functions are the new feature introduced in the release of MySQL version 8 that improves the execution performance of queries.
The MySQL 5.7 Example The rank() function is pretty cool, but it's not available prior to MySQL 8.0. Therefore we'll need to write a creative nested query to rank our records and provide the results.
Notice that this lack of functionality is a thing of the past with MySQL 8 and later: https://dev.mysql.com/doc/refman/8.0/en/window-functions.html
Edit: Rearranged whole answer
Now I understand what you expect.
I've created such a test table on my MySQL and this seems to work:
SELECT e2.event_type FROM events e1
JOIN events e2
ON e1.event_time BETWEEN e2.event_time AND (e2.event_time + INTERVAL 10 MINUTE);
GROUP BY e1.event_id, e2.event_type
HAVING count(e2.event_type) >= 5
Basically, for each event you self join events with specified relative time window (from event_time
to event_time
+ window duration), and then you group by e1's even_id
to get emulated floating time window. Also we're gruping by event_type
here because you want to get this field values for each window.
All you need to think through is performance. I'm not sure if it will be efficient enough for a 1M of records.
MySQL has no window function support, but you can use a correlated subqueries in the SELECT
list to retrieve exactly one column:
SELECT
event_id,
event_type,
event_time,
(SELECT COUNT(*) FROM events EC WHERE EC.event_type = E.event_type AND EC.event_time > E.event_time) AS subsequent_event_count
FROM
events E
WHERE ...
Do EXPLAIN
it. This is kinda the same in terms of execution logic as the CROSS APPLY
in SQL Server.
Another approach is a self join:
SELECT
E.event_id,
E.event_type,
E.event_time,
COUNT(EC.event_id) AS subsequent_event_count
FROM
events E
LEFT JOIN events EC
ON E.event_type = EC.event_type AND E.event_type < EC.event_type
GROUP BY
E.event_id,
E.event_type,
E.event_time
Do test both approaches for performance.
You can do much more creative joins, like
EC.event_time > E.event_time AND EC.event_time < E.event_time + INTERVAL 1 DAY
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With