Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove sets that are subsets

Tags:

algorithm

Suppose you are given two lists of sets of integers. Call them A and B. Every set in A could be contained within one of the sets in B. What is the most efficient algorithm to find all elements in A such that none are contained within a set of B?

like image 335
Student Avatar asked Nov 08 '22 01:11

Student


1 Answers

To try and trim the search space, we can try some checks and filters as part of the search. To take your example in the comments,

A = [{1,2},{1,4},{3,4,7}]
B = [{2,3,4},{1,2,4},{1,2,5,6}] 

start with an O(n) enumeration of unique elements in the sets as keys pointing to cross-correlated indexes of the sets they belong to:

1: A: {0,1}, B: {1,2}
2: A: {0}  , B: {0,1,2}
3: A: {2}  , B: {0}
4: A: {1,2}, B: {0,1}
5: A: {}   , B: {2}
6: A: {}   , B: {2}
7: A: {2}  , B: {}

We can immediately set aside sets in A that include an element not found in B, such as A's third set.

Now, we'll traverse each set in A that has not been excluded and check that there is a corresponding complete intersection of at least one set in B. Since the number of set-indexes in your case is in the millions, rather than initially traversing B in its entirety we will split our examination of B into sections, each of k sets, say 1024. Furthermore, to represent these 1024 set-indexes, we'll split them into 16 bitsets of 64 bits each so we may bitwise-AND (&) one with another.

At any point during this traversal, we benefit from an early exit if the AND operation results in zero:

set A[0] => elements 1,2 => set-index-intersection in B: b110 & b111 => match
set A[1] => elements 1,4 => set-index-intersection in B: b110 & b11  => match

Overall, working section by section, we're looking at around 10 * 16 operations to check if one set in A is included in one of the sets in the current section of k B-sets. In other words, we've reduced the number of operations from 10,000,000 to 160,000 for a complete check of one set in A (per one million sets in B). That's a factor of 62.

like image 159
גלעד ברקן Avatar answered Nov 28 '22 06:11

גלעד ברקן