Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limiting GROUP BY based on COUNT() values in mySQL

Tags:

sql

php

mysql

rdbms

I'm logging events into a mySQL database and would like to get the top 3 events for monitoring purposes.

My table eventlog looks like this:

+----+------------------+---------------------+
| id |    eventname     |      eventdate      |
+----+------------------+---------------------+
|  0 | machine1.started | 2016-09-04 19:22:23 |
|  1 | machine2.reboot  | 2016-09-04 20:23:11 |
|  2 | machine1.stopped | 2016-09-04 20:24:12 |
|  3 | machine1.started | 2016-09-04 20:25:12 |
|  4 | machine1.stopped | 2016-09-04 23:23:16 |
|  5 | machine0.started | 2016-09-04 23:24:00 |
|  6 | machine1.started | 2016-09-04 23:24:16 |
|  7 | machine3.started | 2016-09-04 23:25:00 |
|  8 | machine4.started | 2016-09-04 23:26:00 |
|  9 | cluster.alive    | 2016-09-04 23:30:00 |
| 10 | cluster.alive    | 2016-09-05 11:30:00 |
+----+------------------+---------------------+

The query should eventually return the following, holding

  • the top 3 events that occurred most often (based on column eventcounts that is generated by mySQL's COUNT() function), grouped by their eventname
  • only 2 rows where eventcount = 1, but only if 1 is within the top 3 eventcounts (since there are a lot of events that occur just once and therefore would overload my frontend)

Example of the desired result, based on the above table:

+------------+------------------+
| eventcount |    eventname     |
+------------+------------------+
|          3 | machine1.started |
|          2 | machine1.stopped |
|          2 | cluster.alive    |
|          1 | machine0.started |
|          1 | machine2.started |
+------------+------------------+

Please note that I do not need just 3 returned rows but the rows with the 3 highest eventcounts.

I did a lot of experimenting by messing around with the query string below, including multiple selects and questionable CASE ... WHEN conditions, but wasn't able to make it work the way I need.

SELECT COUNT(id) AS 'eventcount', eventname
FROM eventlog
GROUP BY eventname
ORDER BY eventcount DESC;

What is the best approach to get the desired result in a performant way?

like image 720
IggyBlob Avatar asked Oct 18 '22 02:10

IggyBlob


1 Answers

here is one way of doing it using variables SQL Fiddle for it: http://sqlfiddle.com/#!9/b3458b/16

SELECT
  t2.eventcount
  ,t2.eventname
FROM
(
  SELECT
      t.eventname
      ,t.eventcount
      ,@Rank:=IF(@PrevCount=t.eventcount,@Rank,@Rank+1) Rank
      ,@CountRownum:=IF(@PrevCount=t.eventcount,@CountRowNum + 1,1) CountRowNum
      ,@PrevCount:= t.eventcount
    FROM
      (
        SELECT
          l.eventname
          ,COUNT(*) as eventcount
        FROM
          eventlog l
        GROUP BY
          l.eventname
        ORDER BY
          COUNT(*) DESC
      ) t
      CROSS JOIN (SELECT @Rank:=0, @CountRowNum:=0, @PrevCount:=-1) var
    ORDER BY
      t.eventcount DESC
) t2
WHERE
  t2.Rank < 4
  AND NOT (t2.eventcount = 1 AND t2.CountRowNum > 2)
like image 112
Matt Avatar answered Nov 02 '22 10:11

Matt