Suppose you are given two lists of sets of integers. Call them A and B. Every set in A could be contained within one of the sets in B. What is the most efficient algorithm to find all elements in A such that none are contained within a set of B?
To try and trim the search space, we can try some checks and filters as part of the search. To take your example in the comments,
A = [{1,2},{1,4},{3,4,7}]
B = [{2,3,4},{1,2,4},{1,2,5,6}]
start with an O(n) enumeration of unique elements in the sets as keys pointing to cross-correlated indexes of the sets they belong to:
1: A: {0,1}, B: {1,2}
2: A: {0} , B: {0,1,2}
3: A: {2} , B: {0}
4: A: {1,2}, B: {0,1}
5: A: {} , B: {2}
6: A: {} , B: {2}
7: A: {2} , B: {}
We can immediately set aside sets in A that include an element not found in B, such as A's third set.
Now, we'll traverse each set in A that has not been excluded and check that there is a corresponding complete intersection of at least one set in B. Since the number of set-indexes in your case is in the millions, rather than initially traversing B in its entirety we will split our examination of B into sections, each of k
sets, say 1024. Furthermore, to represent these 1024 set-indexes, we'll split them into 16 bitsets of 64 bits each so we may bitwise-AND (&) one with another.
At any point during this traversal, we benefit from an early exit if the AND operation results in zero:
set A[0] => elements 1,2 => set-index-intersection in B: b110 & b111 => match
set A[1] => elements 1,4 => set-index-intersection in B: b110 & b11 => match
Overall, working section by section, we're looking at around 10 * 16 operations to check if one set in A is included in one of the sets in the current section of k
B-sets. In other words, we've reduced the number of operations from 10,000,000 to 160,000 for a complete check of one set in A (per one million sets in B). That's a factor of 62.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With