Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse multiple rows of arrays if the arrays overlap

I have a table in PostgreSQL 9.3 that contains a single column containing an array in each row. I am trying to find a way to collapse rows of arrays that share the same elements.

Examples

simple overlap

Given the following two rows with arrays:

{ 1, 2, 3 }
{ -5, 3, 6, 9 }

The result would be a row containing:

{ -5, 1, 2, 3, 6, 9 }

This is because the "3" exists in both arrays. Note that the "3" is not repeated.

multiple overlaps

The same overlap concept can also apply to multiple rows anywhere in the table:

{ 1, 2, 3 }
{ 100, 200, 300 }
{ 3, 4, 5 }
{ 5, 6, 7 }

and the desired output would be two rows :

{ 1, 2, 3, 4, 5, 6, 7}
{ 100, 200, 300 }

The arrays returned from the result should be unique and share no elements with each other.

What I've tried

I used a "with recursive" query with array union function, but couldn't figure out the right query.

A sample table to work with is provided here on SQL fiddle (it mimics the second example), or it can be built with:

create table test ( 
  arr integer[] 
); 

insert into test (arr) values ('{ 1, 2, 3 }');
insert into test (arr) values ('{ 100, 200, 300 }');
insert into test (arr) values ('{ 3, 4, 5 }');
insert into test (arr) values ('{ 5, 6, 7 }');
like image 550
Hong Xia Avatar asked Feb 11 '14 23:02

Hong Xia


1 Answers

OK, it was tough. Please have a look at this query:

;with recursive minelem AS(
select arr, MIN(unnest) minel from (select arr, unnest(arr) from test) a group by arr),
testwithrn as(
select arr, row_number() over (order by minel) rn from minelem
),
cte(arr, rn, counter, grp) as(
  select arr, rn, 1, 1 from testwithrn where rn = 1
union all 
  select 
    case when array_length(a.arr & b.arr, 1) > 0 then a.arr | b.arr else b.arr end, 
    b.rn, 
    case when array_length(a.arr & b.arr, 1) > 0 then a.counter + 1 else 1 end,
    case when array_length(a.arr & b.arr, 1) > 0 then a.grp else a.grp + 1 end
    from cte a inner join testwithrn b 
    on b.rn > a.rn
),
grouped as(
  SELECT arr, counter, grp,
  row_number() over (partition by grp order by counter desc) rn from cte)
select distinct arr from grouped where rn = 1

SQL Fiddle

You can test different CTEs in the query above to understand how I have come up with the solution. The key here is to use operator | to merge arrays, as in a.arr | b.arr

There is a recursive query called cte that counts the occurrence of each set within different groups of sets. You can replace the last line to select * from cte order by grp, counter to see how the counter and grp are changed when the sets are recursively built

like image 151
cha Avatar answered Oct 17 '22 17:10

cha