I have a table of rows with the following structure name TEXT, favorite_colors TEXT[], group_name INTEGER
where each row has a list of everyone's favorite colors and the group that person belongs to. How can I GROUP BY group_name
and return a list of the most common colors in each group?
Could you do a combination of int[] && int[]
to set for overlap, int[] & int[]
to get the intersection and then something else to count and rank?
Quick and dirty:
SELECT group_name, color, count(*) AS ct
FROM (
SELECT group_name, unnest(favorite_colors) AS color
FROM tbl
) sub
GROUP BY 1,2
ORDER BY 1,3 DESC;
LATERAL JOIN
In Postgres 9.3 or later this is the cleaner form:
SELECT group_name, color, count(*) AS ct
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
ORDER BY 1,3 DESC;
The above is shorthand for
...
FROM tbl t
JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
...
And like with any other INNER JOIN
, it would exclude rows without color (favorite_colors IS NULL
) - as did the first query.
To include such rows in the result, use instead:
SELECT group_name, color, count(*) AS ct
FROM tbl t
LEFT JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
GROUP BY 1,2
ORDER BY 1,3 DESC;
You can easily aggregate the "most common" colors per group in the next step, but you'd need to define "most common colors" first ...
As per comment, pick colors with > 3 occurrences.
SELECT t.group_name, color, count(*) AS ct
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
HAVING count(*) > 3
ORDER BY 1,3 DESC;
To aggregate the top colors in an array (in descending order):
SELECT group_name, array_agg(color) AS top_colors
FROM (
SELECT group_name, color
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
HAVING count(*) > 3
ORDER BY 1, count(*) DESC
) sub
GROUP BY 1;
-> SQLfiddle demonstrating all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With