Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to SELECT the top row per group based on multiple ordering columns?

I have a query that looks like the following:

SELECT time_start, some_count
    FROM foo
    WHERE user_id = 1
    AND DATE(time_start) = '2016-07-27'
    ORDER BY some_count DESC, time_start DESC LIMIT 1;

What this does is return me one row, where some_count is the highest count for user_id = 1. It also gives me the time stamp which is the most current for that some_count, as some_count could be the same for multiple time_start values and I want the most current one.

Now I'm trying to do is run a query that will figure this out for every single user_id that occurred at least once for a specific date, in this case 2016-07-27. Ultimately it's going to probably require a GROUP BY as I'm looking for a group maximum per user_id

What's the best way to write a query of that nature?

like image 885
randombits Avatar asked Aug 10 '16 13:08

randombits


People also ask

How do you SELECT the first value in a GROUP BY a bunch of rows?

To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).

Can we use top with GROUP BY clause?

The GROUP BY clause can help with that, but it is limited to the single top result for each group. If you want the top 5 per category, GROUP BY won't help by itself. That doesn't mean it can't be done. In fact, in today's blog, we'll learn exactly how to construct a Top N query by group.

How do you SELECT the first row of a group?

The first way to find the first row of each group is by using a correlated subquery. In short, a correlated subquery is a type of subquery that is executed row by row. It uses the values from the outer query, that is, the values from the query it's nested into.


2 Answers

I am sharing two of my approaches.

Approach #1 (scalable):

Using MySQL user_defined variables

SELECT
    t.user_id,
    t.time_start,
    t.time_stop,
    t.some_count
FROM 
(
    SELECT
        user_id,
        time_start,
        time_stop,
        some_count,
        IF(@sameUser = user_id, @rn := @rn + 1,
             IF(@sameUser := user_id, @rn := 1, @rn := 1)
        ) AS row_number

    FROM    foo
    CROSS JOIN (
        SELECT
            @sameUser := - 1,
            @rn := 1
    ) var
    WHERE   DATE(time_start) = '2016-07-27'
    ORDER BY    user_id,    some_count DESC,    time_stop DESC
) AS t
WHERE t.row_number <= 1
ORDER BY t.user_id;

Scalable because if you want latest n rows for each user then you just need to change this line :

... WHERE t.row_number <= n...

I can add explanation later if the query provides expected result


Approach #2:(Not scalable)

Using INNER JOIN and GROUP BY

SELECT 
 F.user_id,
 F.some_count,
 F.time_start,
 MAX(F.time_stop) AS max_time_stop
FROM foo F
INNER JOIN 
(
    SELECT 
        user_id,
        MAX(some_count) AS max_some_count
    FROM foo
    WHERE DATE(time_start) = '2016-07-27'
    GROUP BY user_id
) AS t
ON F.user_id = t.user_id AND F.some_count = t.max_some_count
WHERE DATE(time_start) = '2016-07-27'
GROUP BY F.user_id
like image 146
1000111 Avatar answered Nov 11 '22 19:11

1000111


You can use NOT EXISTS() :

SELECT * FROM foo t
WHERE (DATE(time_start) = '2016-07-27'
   OR DATE(time_stop) = '2016-07-27') 
  AND NOT EXISTS(SELECT 1 FROM foo s
                 WHERE t.user_id = s.user_id
                 AND (s.some_count > t.some_count
                  OR (s.some_count = t.some_count
                      AND s.time_stop > t.time_stop)))

The NOT EXISTS() will select only records that another record with a larger count or a another record with the same count but a newer time_stop doesn't exists for them.

like image 45
sagi Avatar answered Nov 11 '22 18:11

sagi