Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would anyone check 'x in list'?

Tags:

python

list

set

In Python one can very easily check if a value is contained in a container by using the in-operator. I was wondering why anyone would ever use the in-operator on a list, though, when it's much more efficient to first transform the list to a set as such:

if x in [1,2,3]:

as opposed to

if x in set([1,2,3]):

When looking at the time complexity, the first one has O(n) while the second one is superior at O(1). Is the only reason to use the first one the fact that it's more readable and shorter to write? Or is there a special case in which it's more practical to use? Why did the Python devs not implement the first one by first translating it to the second one? Would this not grand both of them the O(1) complexity?

like image 429
Joost Avatar asked Nov 29 '22 01:11

Joost


1 Answers

if x in set([1,2,3]):

is not faster than

if x in [1,2,3]:

Converting a list to a set requires iterating over the list, and is thus at least O(n) time.* In practice it takes a lot longer than searching for an item, since it involves hashing and then inserting every item.

Using a set is efficient when the set is converted once and then checked multiple times. Indeed, trying this by searching for 500 in the list range(1000) indicates that the tradeoff occurs once you are checking at least 3 times:

import timeit

def time_list(x, lst, num):
    for n in xrange(num):
        x in lst

def time_turn_set(x, lst, num):
    s = set(lst)
    for n in xrange(num):
        x in s

for num in range(1, 10):
    size = 1000
    setup_str = "lst = range(%d); from __main__ import %s"
    print num,
    print timeit.timeit("time_list(%d, lst, %d)" % (size / 2, num),
                        setup=setup_str % (size, "time_list"), number=10000),
    print timeit.timeit("time_turn_set(%d, lst, %d)" % (size / 2, num),
                        setup=setup_str % (size, "time_turn_set"), number=10000)

gives me:

1 0.124024152756 0.334127902985
2 0.250166893005 0.343378067017
3 0.359009981155 0.356444835663
4 0.464100837708 0.38081407547
5 0.600295066833 0.34722495079
6 0.692923069 0.358560085297
7 0.787877082825 0.338326931
8 0.877299070358 0.344762086868
9 1.00078821182 0.339591026306

Tests with list sizes ranging from 500 to 50000 give roughly the same result.

* Indeed, in the true asymptotic sense inserting into a hash table (and, for that matter, checking a value) is not O(1) time, but rather a constant speedup of linear O(n) time (since if the list gets too large collisions will build up). That would make the set([1,2,3]) operation be in O(n^2) time rather than O(n). However, in practice, with reasonable sized lists with a good implementation, you can basically always assume insertion and lookup of a hash table to be O(1) operations.

like image 74
David Robinson Avatar answered Dec 05 '22 11:12

David Robinson