Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average complexity of multiple-sets intersection operation and its under the hood implementation

There is no mention of average-case complexity for the multiple-sets intersection on Python wiki:

https://wiki.python.org/moin/TimeComplexity

Only the worst-case complexity is given:

(n-1)*O(l) where l is max(len(s1),..,len(sn))

What is the average complexity of multiple-sets intersection operation? How is this operation implemented under the hood?

set.intersection(s1,s2,s2,s4 ...sn)

Is the multiple-sets intersection operation implemented in a different way than the two-sets intersection operation because their worst case complexities are different according to python-wiki:

2-sets intersection: O(len(s) * len(t)) Multiple-sets intersection: (n-1)*O(l) where l is max(len(s1),..,len(sn))

So the complexity of two sets using multiple-sets formula should be:

--> (2-1)*O(l) where l is max(len(s1), len(s2)`
--> O(max(len(s1), len(s2))

I think it is pretty different than the complexity notation of two set intersection operation.

On a side note, is there a better way than set intersection for membership check between different sets?

NOTE: I am looking for an explanation rather than just the complexity O() notation :)

like image 926
utengr Avatar asked Nov 29 '25 16:11

utengr


1 Answers

As already answered in a similar question, the implementation of the intersection of two sets is analogous to:

def intersect(a, b):
    if len(a) > len(b):
        a, b = b, a

    c = set()
    for x in a:
        if x in b:
            c.add(x)
    return c

For multiple sets it is implemented as a chain of pairwise intersections roughly equivalent to:

def intersect_multi(a, *others):
    result = a.copy()
    for other in others:
        newresult = result.intersect(other)
        if not newresult:
            return set()
    result = newresult

The average complexity is probably not given for this because it depends on whether or not this returns before going through all others, due to the intersection being empty. It can therefore be anywhere between O(k), with k being the length of the first set in others and the worst case.

The worst case complexity for this is then (N-1) * max(O(set_intersection)). O(set_intersection) is usually O(min(k, l)) as you noted, but O(max(k, l)) if the second is not a set. I guess this is included here, so it is basically determined by the longest set.

The worst case for O(set_intersection) stated in the wiki is very unlikely to occur, as noted on this post by Raymond Hettinger. Apparently it only occurs in case where you have a hash collisions every time, so if x in b becomes O(n) (its worst-case complexity).

It seems like this worst-case is not included in the worst-case complexity of multiple set intersections (maybe not because of how highly unlikely it is to have a hash collision for all members of all sets?).

like image 51
Graipher Avatar answered Dec 02 '25 06:12

Graipher



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!