Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql workaround for window functions

Tags:

sql

mysql

I have an event table that has the following fields:

event_id
event_type 
event_time

Given a duration D and a number k, I need a count of all the event_type's that had more than K events in any relative time window of duration D. This basically requires a sliding window with respect to each event. For example, I want all the event_type's that had activity of more than 5 events in any 10 minute duration.

I am not sure how to work around this without window functions.

(I am on mysql 5.6. I am talking about a dataset of under 1 million rows.)

like image 542
smartnut007 Avatar asked May 31 '16 06:05

smartnut007


People also ask

Does MySQL support window functions?

MySQL supports window functions that, for each row from a query, perform a calculation using rows related to that row. The following sections discuss how to use window functions, including descriptions of the OVER and WINDOW clauses.

Does MySQL 5.7 Support window functions?

MySQL Usage Aurora MySQL version 5.7 doesn't support Window functions.

Which version of MySQL supports window functions?

It means window functions perform operations on a set of rows and produces an aggregated value for each row. Therefore each row maintains the unique identities. The window functions are the new feature introduced in the release of MySQL version 8 that improves the execution performance of queries.

Does MySQL 5.7 support rank?

The MySQL 5.7 Example The rank() function is pretty cool, but it's not available prior to MySQL 8.0. Therefore we'll need to write a creative nested query to rank our records and provide the results.


3 Answers

Notice that this lack of functionality is a thing of the past with MySQL 8 and later: https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

like image 26
Lukas Eder Avatar answered Oct 05 '22 19:10

Lukas Eder


Edit: Rearranged whole answer

Now I understand what you expect.

I've created such a test table on my MySQL and this seems to work:

SELECT e2.event_type FROM events e1
JOIN events e2 
    ON e1.event_time BETWEEN e2.event_time AND (e2.event_time + INTERVAL 10 MINUTE);
GROUP BY e1.event_id, e2.event_type
HAVING count(e2.event_type) >= 5

Basically, for each event you self join events with specified relative time window (from event_time to event_time + window duration), and then you group by e1's even_id to get emulated floating time window. Also we're gruping by event_type here because you want to get this field values for each window.

All you need to think through is performance. I'm not sure if it will be efficient enough for a 1M of records.

like image 30
Jakub Matczak Avatar answered Oct 05 '22 20:10

Jakub Matczak


MySQL has no window function support, but you can use a correlated subqueries in the SELECT list to retrieve exactly one column:

SELECT
  event_id,
  event_type, 
  event_time,
  (SELECT COUNT(*) FROM events EC WHERE EC.event_type = E.event_type AND EC.event_time > E.event_time) AS subsequent_event_count
FROM
  events E
WHERE ...

Do EXPLAIN it. This is kinda the same in terms of execution logic as the CROSS APPLY in SQL Server.

Another approach is a self join:

SELECT
  E.event_id,
  E.event_type,
  E.event_time,
  COUNT(EC.event_id) AS subsequent_event_count
FROM
  events E
  LEFT JOIN events EC
    ON E.event_type = EC.event_type AND E.event_type < EC.event_type
GROUP BY
  E.event_id,
  E.event_type,
  E.event_time

Do test both approaches for performance.

You can do much more creative joins, like

EC.event_time > E.event_time AND EC.event_time < E.event_time + INTERVAL 1 DAY
like image 99
Pred Avatar answered Oct 05 '22 19:10

Pred