I have a table in PostgreSQL 9.3 that contains a single column containing an array in each row. I am trying to find a way to collapse rows of arrays that share the same elements.
simple overlap
Given the following two rows with arrays:
{ 1, 2, 3 }
{ -5, 3, 6, 9 }
The result would be a row containing:
{ -5, 1, 2, 3, 6, 9 }
This is because the "3" exists in both arrays. Note that the "3" is not repeated.
multiple overlaps
The same overlap concept can also apply to multiple rows anywhere in the table:
{ 1, 2, 3 }
{ 100, 200, 300 }
{ 3, 4, 5 }
{ 5, 6, 7 }
and the desired output would be two rows :
{ 1, 2, 3, 4, 5, 6, 7}
{ 100, 200, 300 }
The arrays returned from the result should be unique and share no elements with each other.
I used a "with recursive" query with array union function, but couldn't figure out the right query.
A sample table to work with is provided here on SQL fiddle (it mimics the second example), or it can be built with:
create table test (
arr integer[]
);
insert into test (arr) values ('{ 1, 2, 3 }');
insert into test (arr) values ('{ 100, 200, 300 }');
insert into test (arr) values ('{ 3, 4, 5 }');
insert into test (arr) values ('{ 5, 6, 7 }');
OK, it was tough. Please have a look at this query:
;with recursive minelem AS(
select arr, MIN(unnest) minel from (select arr, unnest(arr) from test) a group by arr),
testwithrn as(
select arr, row_number() over (order by minel) rn from minelem
),
cte(arr, rn, counter, grp) as(
select arr, rn, 1, 1 from testwithrn where rn = 1
union all
select
case when array_length(a.arr & b.arr, 1) > 0 then a.arr | b.arr else b.arr end,
b.rn,
case when array_length(a.arr & b.arr, 1) > 0 then a.counter + 1 else 1 end,
case when array_length(a.arr & b.arr, 1) > 0 then a.grp else a.grp + 1 end
from cte a inner join testwithrn b
on b.rn > a.rn
),
grouped as(
SELECT arr, counter, grp,
row_number() over (partition by grp order by counter desc) rn from cte)
select distinct arr from grouped where rn = 1
SQL Fiddle
You can test different CTEs in the query above to understand how I have come up with the solution. The key here is to use operator | to merge arrays, as in a.arr | b.arr
There is a recursive query called cte
that counts the occurrence of each set within different groups of sets. You can replace the last line to select * from cte order by grp, counter
to see how the counter
and grp
are changed when the sets are recursively built
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With