Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit query by count distinct column values

I have a table with people, something like this:

ID PersonId SomeAttribute
1  1        yellow
2  1        red
3  2        yellow
4  3        green
5  3        black
6  3        purple
7  4        white

Previously I was returning all of Persons to API as seperate objects. So if user set limit to 3, I was just setting query maxResults in hibernate to 3 and returning:

{"PersonID": 1, "attr":"yellow"}
{"PersonID": 1, "attr":"red"}
{"PersonID": 2, "attr":"yellow"}

and if someone specify limit to 3 and page 2(setMaxResult(3), setFirstResult(6) it would be:

{"PersonID": 3, "attr":"green"}
{"PersonID": 3, "attr":"black"}
{"PersonID": 3, "attr":"purple"}

But now I want to select people and combine then into one json object to look like this:

{
 "PersonID":3, 
 "attrs": [
    {"attr":"green"},
    {"attr":"black"},
    {"attr":"purple"}
 ]
}

And here is the problem. Is there any possibility in postgresql or hibernate to set limit not by number of rows but to number of distinct people ids, because if user specifies limit to 4 I should return person1, 2, 3 and 4, but in my current limiting mechanism I will return person1 with 2 attributes, person2 and person3 with only one attribute. Same problem with pagination, now I can return half of a person3 array attrs on one page and another half on next page.

like image 241
baant Avatar asked Oct 11 '25 14:10

baant


1 Answers

You can use row_number to simulate LIMIT:

-- Test data
CREATE TABLE person AS 
    WITH tmp ("ID", "PersonId", "SomeAttribute") AS (   
        VALUES 
            (1, 1, 'yellow'::TEXT), 
            (2, 1, 'red'),
            (3, 2, 'yellow'),
            (4, 3, 'green'),
            (5, 3, 'black'),
            (6, 3, 'purple'),
            (7, 4, 'white')
        )
    SELECT * FROM tmp;

-- Returning as a normal column (limit by someAttribute size)
SELECT * FROM (
    select 
        "PersonId",
        "SomeAttribute",
        row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
    from 
        person) as tmp
WHERE rownum <= 3;

-- Returning as a normal column (overall limit)
SELECT * FROM (
    select 
        "PersonId",
        "SomeAttribute",
        row_number() OVER(ORDER BY "PersonId") AS rownum
    from 
        person) as tmp
WHERE rownum <= 4;

-- Returning as a JSON column (limit by someAttribute size)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
    select 
        "PersonId",
        "SomeAttribute",
        row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
    from 
        person) as tmp
WHERE rownum <= 3 GROUP BY "PersonId";

-- Returning as a JSON column (limit by person)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
    select 
        "PersonId",
        "SomeAttribute"
    from 
        person) as tmp
GROUP BY "PersonId"
LIMIT 4;

In this case, of course, you must use a native query, but this is a small trade-off IMHO.

More info here and here.

like image 186
Michel Milezzi Avatar answered Oct 15 '25 12:10

Michel Milezzi