Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for merging sets that share at least 2 elements

Given a list of sets:

  • S_1 : [ 1, 2, 3, 4 ]
  • S_2 : [ 3, 4, 5, 6, 7 ]
  • S_3 : [ 8, 9, 10, 11 ]
  • S_4 : [ 1, 8, 12, 13 ]
  • S_5 : [ 6, 7, 14, 15, 16, 17 ]

What the most efficient way to merge all sets that share at least 2 elements? I suppose this is similar to a connected components problem. So the result would be:

  • [ 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17] (S_1 UNION S_2 UNION S_5)
  • [ 8, 9, 10, 11 ]
  • [ 1, 8, 12, 13 ] (S_4 shares 1 with S_1, and 8 with S_3, but not merged because they only share one element in each)

The naive implementation is O(N^2), where N is the number of sets, which is unworkable for us. This would need to be efficient for millions of sets.

like image 493
bajafresh4life Avatar asked Nov 23 '08 20:11

bajafresh4life


1 Answers

Let there be a list of many Sets named (S)

Perform a pass through all elements of S, to determine the range (LOW .. HIGH).

Create an array of pointer to Set, of dimensions (LOW, HIGH), named (M).

do
    Init all elements of M to NULL.   

    Iterate though S, processing them one Set at a time, named (Si).

        Permutate all ordered pairs in Si. (P1, P2) where P1 <= P2.
        For each pair examine M(P1, P2)
            if M(P1, P2) is NULL
                Continue with the next pair.
            otherwise
                Merge Si, into the Set pointed to by, M(P1, P2).
                Remove Si from S, as it has been merged.
                Move on to processing Set S(i + 1)

        If Si was not merged, 
            Permutate again through Si
            For each pair, make M(P1, P2) point to Si.

while At least one set was merged during the pass.

My head is saying this is about Order (2N ln N). Take that with a grain of salt.

like image 77
EvilTeach Avatar answered Sep 25 '22 02:09

EvilTeach