I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
I'd like to end up with something like:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
To achieve this I'm currently using:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
But I get an error: ERROR: cannot accumulate arrays of different dimensionality. SQL state: 2202E
How can I attain the desired results? The inner query from the above statement returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
Grouping these students now to the parents is where I'm stuck.
I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221
step-by-step demo:db<>fiddle
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
Aggregate the p_id values into an array:
| s_id | parents |
|---|---|
| 108 | {5} |
| 105 | {3} |
| 100 | {1,3} |
Self-join the original table on this array:
| s_id | parents | p_id | parent_name | s_id | student_name |
|---|---|---|---|---|---|
| 100 | {1,3} | 1 | John Doe | 100 | Mike Doe |
| 105 | {3} | 3 | Jane Doe | 100 | Mike Doe |
| 100 | {1,3} | 3 | Jane Doe | 100 | Mike Doe |
| 105 | {3} | 3 | Jane Doe | 105 | Lisa Doe |
| 100 | {1,3} | 3 | Jane Doe | 105 | Lisa Doe |
| 108 | {5} | 5 | Will Willy | 108 | William Son |
Remove all duplicate student records. The remaining ones should be the records with the most complete p_id array. This can be done using DISTINCT ON(s_id) on a descending order by the array length:
| s_id | parents | p_id | parent_name | s_id | student_name |
|---|---|---|---|---|---|
| 100 | {1,3} | 1 | John Doe | 100 | Mike Doe |
| 100 | {1,3} | 3 | Jane Doe | 105 | Lisa Doe |
| 108 | {5} | 5 | Will Willy | 108 | William Son |
Finally you can group by the p_id array and aggregate the two name columns:
| parents | students |
|---|---|
| {"John Doe","Jane Doe"} | {"Mike Doe","Lisa Doe"} |
| {"Will Willy"} | {"William Son"} |
If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',') instead of array_agg(name_column)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With