Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

list comprehension filtering - "the set() trap"

A reasonably common operation is to filter one list based on another list. People quickly find that this:

[x for x in list_1 if x in list_2] 

is slow for large inputs - it's O(n*m). Yuck. How do we speed this up? Use a set to make filtering lookups O(1):

s = set(list_2) [x for x in list_1 if x in s] 

This gives nice overall O(n) behavior. I however often see even veteran coders fall into The Trap™:

[x for x in list_1 if x in set(list_2)] 

Ack! This is again O(n*m) since python builds set(list_2) every time, not just once.


I thought that was the end of the story - python can't optimize it away to only build the set once. Just be aware of the pitfall. Gotta live with it. Hmm.

#python 3.3.2+ list_2 = list(range(20)) #small for demonstration purposes s = set(list_2) list_1 = list(range(100000)) def f():     return [x for x in list_1 if x in s] def g():     return [x for x in list_1 if x in set(list_2)] def h():     return [x for x in list_1 if x in {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}]  %timeit f() 100 loops, best of 3: 7.31 ms per loop  %timeit g() 10 loops, best of 3: 77.4 ms per loop  %timeit h() 100 loops, best of 3: 6.66 ms per loop 

Huh, python (3.3) can optimize away a set literal. It's even faster than f() in this case, presumably because it gets to replace a LOAD_GLOBAL with a LOAD_FAST.

#python 2.7.5+ %timeit h() 10 loops, best of 3: 72.5 ms per loop 

Python 2 notably doesn't do this optimization. I've tried investigating further what python3 is doing but unfortunately dis.dis cannot probe the innards of comprehension expressions. Basically everything interesting turns into MAKE_FUNCTION.

So now I'm wondering - why can python 3.x optimize away the set literal to only build once, but not set(list_2)?

like image 222
roippi Avatar asked Nov 18 '13 19:11

roippi


People also ask

Which is faster filter or list comprehension?

1 Answer. Actually, list comprehension is much clearer and faster than filter+lambda, but you can use whichever you find easier.

Do you need brackets for list comprehension python?

Generator Expression. We learned that we use curly brackets for the set comprehension and square brackets for the list comprehension.

How do you filter a list of strings?

Filter a list of string using filter() method. filter() method accepts two parameters. The first parameter takes a function name or None and the second parameter takes the name of the list variable as values. filter() method stores those data from the list if it returns true, otherwise, it discards the data.

Are list comprehensions slow?

List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration. This is slow.


2 Answers

In order to optimize set(list_2), the interpreter needs to prove that list_2 (and all of its elements) does not change between iterations. This is a hard problem in the general case, and it would not surprise me if the interpreter does not even attempt to tackle it.

On the other hand a set literal cannot change its value between iterations, so the optimization is known to be safe.

like image 63
NPE Avatar answered Sep 22 '22 03:09

NPE


From What’s New In Python 3.2:

Python’s peephole optimizer now recognizes patterns such x in {1, 2, 3} as being a test for membership in a set of constants. The optimizer recasts the set as a frozenset and stores the pre-built constant.

like image 21
Ashwini Chaudhary Avatar answered Sep 18 '22 03:09

Ashwini Chaudhary