I have a query that looks like the following:
SELECT time_start, some_count
FROM foo
WHERE user_id = 1
AND DATE(time_start) = '2016-07-27'
ORDER BY some_count DESC, time_start DESC LIMIT 1;
What this does is return me one row, where some_count is the highest count for user_id = 1
. It also gives me the time stamp which is the most current for that some_count
, as some_count
could be the same for multiple time_start
values and I want the most current one.
Now I'm trying to do is run a query that will figure this out for every single user_id
that occurred at least once for a specific date, in this case 2016-07-27
. Ultimately it's going to probably require a GROUP BY as I'm looking for a group maximum per user_id
What's the best way to write a query of that nature?
To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).
The GROUP BY clause can help with that, but it is limited to the single top result for each group. If you want the top 5 per category, GROUP BY won't help by itself. That doesn't mean it can't be done. In fact, in today's blog, we'll learn exactly how to construct a Top N query by group.
The first way to find the first row of each group is by using a correlated subquery. In short, a correlated subquery is a type of subquery that is executed row by row. It uses the values from the outer query, that is, the values from the query it's nested into.
I am sharing two of my approaches.
Approach #1 (scalable):
Using MySQL user_defined variables
SELECT
t.user_id,
t.time_start,
t.time_stop,
t.some_count
FROM
(
SELECT
user_id,
time_start,
time_stop,
some_count,
IF(@sameUser = user_id, @rn := @rn + 1,
IF(@sameUser := user_id, @rn := 1, @rn := 1)
) AS row_number
FROM foo
CROSS JOIN (
SELECT
@sameUser := - 1,
@rn := 1
) var
WHERE DATE(time_start) = '2016-07-27'
ORDER BY user_id, some_count DESC, time_stop DESC
) AS t
WHERE t.row_number <= 1
ORDER BY t.user_id;
Scalable because if you want latest n rows for each user then you just need to change this line :
... WHERE t.row_number <= n...
I can add explanation later if the query provides expected result
Approach #2:(Not scalable)
Using INNER JOIN and GROUP BY
SELECT
F.user_id,
F.some_count,
F.time_start,
MAX(F.time_stop) AS max_time_stop
FROM foo F
INNER JOIN
(
SELECT
user_id,
MAX(some_count) AS max_some_count
FROM foo
WHERE DATE(time_start) = '2016-07-27'
GROUP BY user_id
) AS t
ON F.user_id = t.user_id AND F.some_count = t.max_some_count
WHERE DATE(time_start) = '2016-07-27'
GROUP BY F.user_id
You can use NOT EXISTS()
:
SELECT * FROM foo t
WHERE (DATE(time_start) = '2016-07-27'
OR DATE(time_stop) = '2016-07-27')
AND NOT EXISTS(SELECT 1 FROM foo s
WHERE t.user_id = s.user_id
AND (s.some_count > t.some_count
OR (s.some_count = t.some_count
AND s.time_stop > t.time_stop)))
The NOT EXISTS()
will select only records that another record with a larger count or a another record with the same count but a newer time_stop
doesn't exists for them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With