Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: Most efficient way to create a [func(i) for i in range(N)] list comprehension

Say I have a function func(i) that creates an object for an integer i, and N is some nonnegative integer. Then what's the fastest way to create a list (not a range) equal to this list

mylist = [func(i) for i in range(N)]

without resorting to advanced methods like creating a function in C? My main concern with the above list comprehension is that I'm not sure if python knows beforehand the length of range(N) to preallocate mylist, and therefore has to incrementally reallocate the list. Is that the case or is python clever enough to allocate mylist to length N first and then compute it's elements? If not, what's the best way to create mylist? Maybe this?

mylist = [None]*N
for i in range(N): mylist[i] = func(i)

RE-EDIT: Removed misleading information from a previous edit.

like image 703
mejiwa Avatar asked Mar 13 '10 20:03

mejiwa


2 Answers

Somebody wrote: """Python is smart enough. As long as the object you're iterating over has a __len__ or __length_hint__ method, Python will call it to determine the size and preallocate the array."""

As far as I can tell, there is no preallocation in a list comprehension. Python has no way of telling from the size of the INPUT what the size of the OUTPUT will be.

Look at this Python 2.6 code:

>>> def foo(func, iterable):
...     return [func(i) for i in iterable]
...
>>> import dis; dis.dis(foo)
  2           0 BUILD_LIST               0 #### build empty list
              3 DUP_TOP
              4 STORE_FAST               2 (_[1])
              7 LOAD_FAST                1 (iterable)
             10 GET_ITER
        >>   11 FOR_ITER                19 (to 33)
             14 STORE_FAST               3 (i)
             17 LOAD_FAST                2 (_[1])
             20 LOAD_FAST                0 (func)
             23 LOAD_FAST                3 (i)
             26 CALL_FUNCTION            1
             29 LIST_APPEND      #### stack[-2].append(stack[-1]); pop()
             30 JUMP_ABSOLUTE           11
        >>   33 DELETE_FAST              2 (_[1])
             36 RETURN_VALUE

It just builds an empty list, and appends whatever the iteration delivers.

Now look at this code, which has an 'if' in the list comprehension:

>>> def bar(func, iterable):
...     return [func(i) for i in iterable if i]
...
>>> import dis; dis.dis(bar)
  2           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_FAST               2 (_[1])
              7 LOAD_FAST                1 (iterable)
             10 GET_ITER
        >>   11 FOR_ITER                30 (to 44)
             14 STORE_FAST               3 (i)
             17 LOAD_FAST                3 (i)
             20 JUMP_IF_FALSE           17 (to 40)
             23 POP_TOP
             24 LOAD_FAST                2 (_[1])
             27 LOAD_FAST                0 (func)
             30 LOAD_FAST                3 (i)
             33 CALL_FUNCTION            1
             36 LIST_APPEND
             37 JUMP_ABSOLUTE           11
        >>   40 POP_TOP
             41 JUMP_ABSOLUTE           11
        >>   44 DELETE_FAST              2 (_[1])
             47 RETURN_VALUE
>>>

The same code, plus some code to avoid the LIST_APPEND.

In Python 3.X, you need to dig a little deeper:

>>> import dis
>>> def comprehension(f, iterable): return [f(i) for i in iterable]
...
>>> dis.dis(comprehension)
  1           0 LOAD_CLOSURE             0 (f)
              3 BUILD_TUPLE              1
              6 LOAD_CONST               1 (<code object <listcomp> at 0x00C4B8D
8, file "<stdin>", line 1>)
              9 MAKE_CLOSURE             0
             12 LOAD_FAST                1 (iterable)
             15 GET_ITER
             16 CALL_FUNCTION            1
             19 RETURN_VALUE
>>> dis.dis(comprehension.__code__.co_consts[1])
  1           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                18 (to 27)
              9 STORE_FAST               1 (i)
             12 LOAD_DEREF               0 (f)
             15 LOAD_FAST                1 (i)
             18 CALL_FUNCTION            1
             21 LIST_APPEND              2
             24 JUMP_ABSOLUTE            6
        >>   27 RETURN_VALUE
>>>

It's the same old schtick: start off with building an empty list, then iterate over the iterable, appending to the list as required. I see no preallocation here.

The optimisation that you are thinking about is used inside a single opcode e.g. the implementation of list.extend(iterable) can preallocate if iterable can accurately report its length. list.append(object) is given a single object, not an iterable.

like image 76
John Machin Avatar answered Oct 12 '22 11:10

John Machin


There is no difference in computational complexity between using an autoresizing array and preallocating an array. At worst, it costs about O(2N). See here:

Constant Amortized Time

The cost of the function calls plus whatever happens in your function is going to make this extra n insignificant. This isn't something you should worry about. Just use the list comprehension.

like image 39
Matt Anderson Avatar answered Oct 12 '22 12:10

Matt Anderson