I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.
The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:
SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20
... to select all rows needed, and then
SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5
... in a loop per each row selected by the first query.
Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts
setup is just for example, I made this up so it's, hopefully, easier to understand what I need.
I started with the following query:
SELECT *
FROM `posts`
WHERE `user_id`
IN (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4
)
LIMIT 5
But I get the #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
error.
So I've tried the JOIN
idea like suggested here:
SELECT posts.id,
posts.user_id,
NULL
FROM (
SELECT posts.user_id
FROM posts
WHERE posts.deleted = 0
LIMIT 20
) q
JOIN posts
ON posts.user_id = q.user_id
I've also tried several nested queries as suggested here:
SELECT *
FROM posts
WHERE user_id IN (
SELECT * FROM (
SELECT user_id
FROM posts
LIMIT 20
)
as t);
And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN
, RIGHT JOIN
, even INNER JOIN
, but still no success.
Please help.
UPDATE Forgot to mention that the table is around 5GB in size.
UPDATE Tried the sub-sub query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
Same as above, it returns the following:
+----+---------+------+
| id | user_id | post |
+----+---------+------+
| 1 | 1 | a |
+----+---------+------+
| 2 | 1 | b |
+----+---------+------+
| 3 | 1 | c |
+----+---------+------+
| .. | .. | .. |
I.e. the 5 (which is what the outer LIMIT
is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:
SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
I get the 4 distinct values:
+---------+
| user_id |
+---------+
| 1 |
+---------+
| 2 |
+---------+
| 3 |
+---------+
| 4 |
+---------+
You have to use variables, doing two different counts on an ordered query: one on the number of the posts for each user, and one for the users:
SELECT posts_counts.*
FROM (
SELECT
posts.*,
@post_count:=case when @prec_user_id=user_id then @post_count+1 else 1 end as pc,
case when @prec_user_id<>user_id then @user_count:=@user_count+1 else @user_count end as uc,
@prec_user_id:=user_id
FROM
posts,
(select @prec_user_id:=0, @user_count:=0, @post_count:=0) counts
ORDER BY
posts.user_id ) posts_counts
WHERE pc<5 and uc<4
EDIT: You might also consider trying this query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT user_id FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
(this will justs select 5 posts, out of all posts from each selected user, so it's still not what you need, but it uses a trick to use a LIMIT in a sub-subquery)
EDIT2: Next query will limit 5 posts for each of the 20 users:
select posts_limited.*
from (
select
posts.*,
@row:=if(@last_user=posts.user_id, @row+1, 1) as row,
@last_user:=posts.user_id
from
posts inner join
(select user_id from
(select distinct user_id
from posts
order by user_id desc
LIMIT 20) limit_users
) limit_users
on posts.user_id = limit_users.user_id,
(select @last_user:=0, @row:=0) r
) posts_limited
where row<=5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With