Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get count for each group, but stop counting after N result rows in each group

Tags:

mysql

mariadb

I'm trying to optimize a query that (unnecessarily) counts through almost 900 000 rows in a table, which takes way too long.

The table contains log entries for events taking place in different parts of a web app, and I want to know how many unread log entries exist for each log type when the row count for that type is 1000 or less, but count at most 1001 rows if the count is 1001 or more.

I don't need to count any more after that, I'll just output "more than 1000" for that log type.

Let's say we have the following table called my_logs with data:

id    log_type    log_text   is_read
1     'Type 1'    'Text 1'   1
2     'Type 1'    'Text 2'   1
3     'Type 1'    'Text 3'   0
4     'Type 1'    'Text 4'   0
5     'Type 1'    'Text 5'   0
6     'Type 1'    'Text 6'   0
7     'Type 2'    'Text 7'   0
8     'Type 2'    'Text 8'   0

In this example, my current query would look like this:

SELECT log_type, COUNT(*) AS unread FROM my_logs WHERE is_read = 0 GROUP BY log_type;

This query counts every row, and gives the correct amount of rows for each log type of course. The problem is that when the table contains 900 000 rows, this is an expensive query, and counting more than 1000 rows of each type is totally unneccessary as users won't care about the difference between 1 000 and 20 000, they'll just see a lot of entries.

This is the closest I got to a solution (limit adjusted to fit my_logs example and demonstrate usage):

SELECT log_type, COUNT(*) AS unread
FROM (
    SELECT log_type
    FROM my_logs ml1
    WHERE is_read = 0
    LIMIT 3 /* To display "more than 2" in webapp */
) AS ml2
GROUP BY logtype_txt;

but this query pools together all log_types in the inner query and limits that to 1001 rows, which is not what I want. I need to split the rows into each log_type, and then count max 1001 rows. The output I want in this example would be:

log_type    unread
'Type 1'    3
'Type 2'    2

This question and this question discuss how to stop counting when n rows are found, but don't take into account the grouping I need.

Does anyone know a solution?

like image 894
Adrian Wiik Avatar asked Nov 07 '22 07:11

Adrian Wiik


1 Answers

This answer does not work in MariaDB or MySQL.

The answer you are looking for is based on a "lateral table expression". This is implemented in Oracle, DB2, PostgreSQL, and SQL Server.

Here's the query that would be optimal in terms of rows read from the table, in PostgreSQL:

select x.log_type, count(y.z)
from (
  select distinct log_type as log_type
  from my_log
) x
left join lateral (
  select 1 as z
  from my_log b 
  where b.log_type = x.log_type and is_read = 0
  limit 2 + 1
) y on true
group by x.log_type

See running example at DB Fiddle.

Lateral queries are executed once according to values available on table expression placed before them. IN this case the table expression x will produce all different values for log_type (using the index for performance). Then the lateral query will be executed once per each value from x, with a LIMIT of 3 (in this case). Finally the query counts how many z values where encountered.

As you see, the process above only reads a maximum of 3 rows per type.

like image 123
The Impaler Avatar answered Nov 15 '22 05:11

The Impaler