Python3 Does input order matter for the .intersection() function in terms of runtime?

Tags:

Lets say you have two sets, set1 is very large (a couple million values), and set2 is relatively small (a couple hundred thousand values). If I wanted to get the intersection of values between these two sets using the .interstion() function, would there be a runtime improvement based on the order of the inputs?

For example would one of these run faster than the other?

set1.intersection(set2)
set2.intersection(set1)

713

asked Jul 10 '20 17:07

DJ Wolfson

1 Answers

No, input order does not matter. In CPython (the standard Python implementation), the set_intersection function handles set intersection. In the case where the other argument is also a set, the function will swap the two sets such that the smaller one is iterated through while the larger set is used for (constant time) lookup, as Booboo described:

        if (PySet_GET_SIZE(other) > PySet_GET_SIZE(so)) {
            tmp = (PyObject *)so;
            so = (PySetObject *)other;
            other = tmp;
        }

        while (set_next((PySetObject *)other, &pos, &entry)) {
            key = entry->key;
            hash = entry->hash;
            rv = set_contains_entry(so, key, hash);
            if (rv < 0) {
                Py_DECREF(result);
                return NULL;
            }
            if (rv) {
                if (set_add_entry(result, key, hash)) {
                    Py_DECREF(result);
                    return NULL;
                }
            }
        }

Thus, where set1 and set2 are sets, set1.intersect(set2) and set2.intersect(set1) will have the same performance. A small empirical test with timeit agrees:

import random
import string
import timeit

big_set = set()
while len(big_set) < 1000000:
    big_set.add(''.join(random.choices(string.ascii_letters, k=6)))


small_set = set()
while len(small_set) < 10000:
    small_set.add(''.join(random.choices(string.ascii_letters, k=6)))

print("Timing...")
print(f"big_set.intersection(small_set): {min(timeit.Timer(lambda: big_set.intersection(small_set)).repeat(31, 500))}")
print(f"small_set.intersection(big_set): {min(timeit.Timer(lambda: small_set.intersection(big_set)).repeat(31, 500))}")

178

answered Sep 16 '22 11:09

xavc

Related questions
                            
                                Kotlin's REPL println not printing to new line, instead prints everything to same line
                            
                                Avoiding loops when using NumPy's sum
                            
                                In Java, can one get away with using "raw unparameterised class"-es instead of using dummy interfaces?
                            
                                aws sts get-session-token ... --token-code ... fails with InvalidClientTokenId, but MFA console login working
                            
                                Mypy: How should I type a dict that has strings as keys and the values can be either strings or lists of strings?
                            
                                Wireshark Decryption of TLS V1.2
                            
                                how to restore Firefox not the last "Restore Previous Session" pages
                            
                                How to get predictions and calculate accuracy for a given test set in fast ai?
                            
                                Microsoft-Teams: Unable to test personal tab because of "There was a problem reaching this app" error
                            
                                Any tips on context manager similar to Python in Javascript?
                            
                                How do I add a path in the home directory in GitHub Actions CI?
                            
                                How to programmatically make keyboard textfield open

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With