A reasonably common operation is to filter one list
based on another list
. People quickly find that this:
[x for x in list_1 if x in list_2]
is slow for large inputs - it's O(n*m). Yuck. How do we speed this up? Use a set
to make filtering lookups O(1):
s = set(list_2) [x for x in list_1 if x in s]
This gives nice overall O(n) behavior. I however often see even veteran coders fall into The Trap™:
[x for x in list_1 if x in set(list_2)]
Ack! This is again O(n*m) since python builds set(list_2)
every time, not just once.
I thought that was the end of the story - python can't optimize it away to only build the set
once. Just be aware of the pitfall. Gotta live with it. Hmm.
#python 3.3.2+ list_2 = list(range(20)) #small for demonstration purposes s = set(list_2) list_1 = list(range(100000)) def f(): return [x for x in list_1 if x in s] def g(): return [x for x in list_1 if x in set(list_2)] def h(): return [x for x in list_1 if x in {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}] %timeit f() 100 loops, best of 3: 7.31 ms per loop %timeit g() 10 loops, best of 3: 77.4 ms per loop %timeit h() 100 loops, best of 3: 6.66 ms per loop
Huh, python (3.3) can optimize away a set literal. It's even faster than f()
in this case, presumably because it gets to replace a LOAD_GLOBAL
with a LOAD_FAST
.
#python 2.7.5+ %timeit h() 10 loops, best of 3: 72.5 ms per loop
Python 2 notably doesn't do this optimization. I've tried investigating further what python3 is doing but unfortunately dis.dis
cannot probe the innards of comprehension expressions. Basically everything interesting turns into MAKE_FUNCTION
.
So now I'm wondering - why can python 3.x optimize away the set literal to only build once, but not set(list_2)
?
1 Answer. Actually, list comprehension is much clearer and faster than filter+lambda, but you can use whichever you find easier.
Generator Expression. We learned that we use curly brackets for the set comprehension and square brackets for the list comprehension.
Filter a list of string using filter() method. filter() method accepts two parameters. The first parameter takes a function name or None and the second parameter takes the name of the list variable as values. filter() method stores those data from the list if it returns true, otherwise, it discards the data.
List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration. This is slow.
In order to optimize set(list_2)
, the interpreter needs to prove that list_2
(and all of its elements) does not change between iterations. This is a hard problem in the general case, and it would not surprise me if the interpreter does not even attempt to tackle it.
On the other hand a set literal cannot change its value between iterations, so the optimization is known to be safe.
From What’s New In Python 3.2:
Python’s peephole optimizer now recognizes patterns such
x in {1, 2, 3}
as being a test for membership in a set of constants. The optimizer recasts the set as a frozenset and stores the pre-built constant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With