Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Disjoint Set Approximation (Union Find) in SQL

What would be the best way to implement approximate Disjoint Sets using SQL?

Details

I have a table of edges, stored as a two-column table of [vertex_a, vertex_b].

I need a table of distinct sets, stored as [vertex, set_id] with one row per vertex, labeling each vertex with a disjoint set_id.

Constraints

  • Must be a purely SQL implementation. It can leverage Postgres-specific functions, but pure ANSI SQL highly preferred.
  • The result can be approximate- it's acceptable to label a few sets as disjoint when they're actually connected. It's even better if the approximation bounds can be adjusted- by increasing iterations for example.
  • Libraries are out (no Boost, Numpy, Scipy). Must be SQL.
  • Most sets will contain 1 to 3 vertices. Very few large sets, expected max to be 10 vertices.

Related

  • Related to: Implementing Disjoint Sets (Union Find) in C++
  • This would be an approximate implementation of Disjoint-set (Union Find) - Wikipedia
like image 343
supyo Avatar asked Nov 09 '22 07:11

supyo


1 Answers

I'm actually working on the same problem. Unfortunately, I don't think a very efficient solution can be found - at least not easily using just SQL. Just remove the 'distinct' and the self-eliminating query to observe how large the working sets become. That said, the following solution will work.

drop table if exists foobar;
drop function if exists addset(v int[], a int);
/* our vertices table */
create table foobar (
   src int,
   dst int
);

/* Create a small function to treat an array like a set, 
   not strictly necessary but convenient */
create function addset(v int[], a int) returns int[]
as $$
begin
    return (select array_agg(e order by e) 
                   from (select unnest(v || a) as e) f);
end
$$ language plpgsql;

/* fill our table with vertices, note the ordering of each value */
insert into foobar (src, dst) 
     values (1,2), (1,3), (2,3),  (3,4), (4,5), (6,7), (6,8);
/* use a recursive query to extend the sets */
with recursive foo_union (v) as (
    select array[src, dst] as v from foobar /* starter sets */
    union all
    /* join self to original array; i can use a CTE as a 'starter',
       but that is not necessary here */
    select addset(v, dst) as v from foo_union u, foobar f
        where src = any(v) and not dst = any(v)
 ) select distinct v from foo_union a where not exists (
     /* eliminate the many overlapping results */
     select * from foo_union b where b.v @> a.v and b.v != a.v
 );

But again, this is completely impractical on larger data sets; any other solution would require iteration.

like image 80
Bart Avatar answered Nov 14 '22 21:11

Bart