In Python one can very easily check if a value is contained in a container by using the in
-operator. I was wondering why anyone would ever use the in
-operator on a list, though, when it's much more efficient to first transform the list to a set as such:
if x in [1,2,3]:
as opposed to
if x in set([1,2,3]):
When looking at the time complexity, the first one has O(n) while the second one is superior at O(1). Is the only reason to use the first one the fact that it's more readable and shorter to write? Or is there a special case in which it's more practical to use? Why did the Python devs not implement the first one by first translating it to the second one? Would this not grand both of them the O(1) complexity?
if x in set([1,2,3]):
is not faster than
if x in [1,2,3]:
Converting a list to a set requires iterating over the list, and is thus at least O(n)
time.* In practice it takes a lot longer than searching for an item, since it involves hashing and then inserting every item.
Using a set is efficient when the set is converted once and then checked multiple times. Indeed, trying this by searching for 500
in the list range(1000)
indicates that the tradeoff occurs once you are checking at least 3 times:
import timeit
def time_list(x, lst, num):
for n in xrange(num):
x in lst
def time_turn_set(x, lst, num):
s = set(lst)
for n in xrange(num):
x in s
for num in range(1, 10):
size = 1000
setup_str = "lst = range(%d); from __main__ import %s"
print num,
print timeit.timeit("time_list(%d, lst, %d)" % (size / 2, num),
setup=setup_str % (size, "time_list"), number=10000),
print timeit.timeit("time_turn_set(%d, lst, %d)" % (size / 2, num),
setup=setup_str % (size, "time_turn_set"), number=10000)
gives me:
1 0.124024152756 0.334127902985
2 0.250166893005 0.343378067017
3 0.359009981155 0.356444835663
4 0.464100837708 0.38081407547
5 0.600295066833 0.34722495079
6 0.692923069 0.358560085297
7 0.787877082825 0.338326931
8 0.877299070358 0.344762086868
9 1.00078821182 0.339591026306
Tests with list sizes ranging from 500 to 50000 give roughly the same result.
* Indeed, in the true asymptotic sense inserting into a hash table (and, for that matter, checking a value) is not O(1)
time, but rather a constant speedup of linear O(n)
time (since if the list gets too large collisions will build up). That would make the set([1,2,3])
operation be in O(n^2)
time rather than O(n)
. However, in practice, with reasonable sized lists with a good implementation, you can basically always assume insertion and lookup of a hash table to be O(1)
operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With