Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Set Lookup Efficiency

I'm aware that python sets have O(1) lookup time and python lists have O(n) lookup time, but I'm curious about the container size at which it becomes worthwhile to convert a list to a set.

In other words, if I were to call the below:

arr = [1, 2, 3]
for i in range(1000000):
    random.randint(1,3) in arr

would it be more efficient than the calling the following?

s = set([1, 2, 3])
for i in range(1000000):
    random.randint(1,3) in s

More importantly, what is the crossover length?

EDIT: The consensus is that this is entirely dependent on the efficient of the hash method of user defined objects, but for primitives like string, int, etc -- the cutoff is around 1-3.

like image 925
Madison May Avatar asked Jan 02 '14 18:01

Madison May


1 Answers

Here's some code you can use to test it for yourself using timeit:

import timeit
for i in range(10):
    l = list(range(i))
    s = set(l)
    t1 = timeit.timeit(lambda: None in l, )
    t2 = timeit.timeit(lambda: None in s)
    print(i, t1, t2)

You should run this on the platform and Python implementation that you actually care about.

Also notice that I'm searching for None rather than 1, because searching for a value that's guaranteed to be the first (or second) thing in a list is constant-time, and that I'm using integers as in your initial test (which are, of course, trivial to hash). You should test on the actual data you care about.

Anyway, testing it on all of the implementations I have handy, I get a cutoff of 0 (64-bit PyPy 2.1.0/2.7.3) to 3 (32-bit PyPy 1.9.0/2.7.2), with most of them being 1-2. For example, here's 64-bit Python 3.3.2 crossing over at 1:

0 0.10865500289946795 0.11782343708910048
1 0.1330389219801873 0.11656044493429363

If you intentionally create an object that's slow to hash and doesn't cache, of course, you can push that cutoff as high as you want. For example, by putting a time.sleep(1) in my __hash__ method, it ends up being around 12M.

like image 125
abarnert Avatar answered Sep 24 '22 20:09

abarnert