Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SELECT several records for each DISTINCT one in a single query

I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.

The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:

SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20

... to select all rows needed, and then

SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5

... in a loop per each row selected by the first query.

Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts setup is just for example, I made this up so it's, hopefully, easier to understand what I need.

I started with the following query:

SELECT * 
FROM `posts` 
WHERE `user_id` 
IN (
    SELECT DISTINCT `user_id` 
    FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4
) 
LIMIT 5

But I get the #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' error.

So I've tried the JOIN idea like suggested here:

SELECT  posts.id,
        posts.user_id,
        NULL
FROM    (
        SELECT  posts.user_id
        FROM    posts
        WHERE   posts.deleted = 0
        LIMIT 20
        ) q
JOIN    posts
ON      posts.user_id = q.user_id

I've also tried several nested queries as suggested here:

SELECT *
FROM posts 
WHERE user_id IN (
      SELECT * FROM (
            SELECT user_id 
            FROM posts 
            LIMIT 20
      ) 
      as t);

And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN, RIGHT JOIN, even INNER JOIN, but still no success.

Please help.

UPDATE Forgot to mention that the table is around 5GB in size.

UPDATE Tried the sub-sub query:

SELECT * 
FROM `posts` 
WHERE
  `user_id` IN ( SELECT `user_id` FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users
  ) 
LIMIT 5

Same as above, it returns the following:

+----+---------+------+
| id | user_id | post |
+----+---------+------+
|  1 |       1 |    a |
+----+---------+------+
|  2 |       1 |    b |
+----+---------+------+
|  3 |       1 |    c |
+----+---------+------+
| .. |      .. |   .. |

I.e. the 5 (which is what the outer LIMIT is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:

    SELECT `user_id` FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users

I get the 4 distinct values:

+---------+
| user_id |
+---------+
|       1 |
+---------+
|       2 |
+---------+
|       3 |
+---------+
|       4 |
+---------+
like image 219
mehov Avatar asked Nov 13 '22 14:11

mehov


1 Answers

You have to use variables, doing two different counts on an ordered query: one on the number of the posts for each user, and one for the users:

SELECT posts_counts.*
FROM (
  SELECT
    posts.*,
    @post_count:=case when @prec_user_id=user_id then @post_count+1 else 1 end as pc,
    case when @prec_user_id<>user_id then @user_count:=@user_count+1 else @user_count end as uc,
    @prec_user_id:=user_id
  FROM
    posts,
    (select @prec_user_id:=0, @user_count:=0, @post_count:=0) counts
  ORDER BY
    posts.user_id ) posts_counts
WHERE pc<5 and uc<4

EDIT: You might also consider trying this query:

SELECT * 
FROM `posts` 
WHERE
  `user_id` IN ( SELECT user_id FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users
  ) 
LIMIT 5

(this will justs select 5 posts, out of all posts from each selected user, so it's still not what you need, but it uses a trick to use a LIMIT in a sub-subquery)

EDIT2: Next query will limit 5 posts for each of the 20 users:

select posts_limited.*
from (
  select
    posts.*,
    @row:=if(@last_user=posts.user_id, @row+1, 1) as row,
   @last_user:=posts.user_id
  from
    posts inner join
    (select user_id from
      (select distinct user_id
       from posts
       order by user_id desc
       LIMIT 20) limit_users
    ) limit_users
    on posts.user_id = limit_users.user_id,
    (select @last_user:=0, @row:=0) r
  ) posts_limited
  where row<=5
like image 67
fthiella Avatar answered Nov 15 '22 05:11

fthiella