Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pre-allocating a list of None

Suppose you want to write a function which yields a list of objects, and you know in advance the length n of such list.

In python the list supports indexed access in O(1), so it is arguably a good idea to pre-allocate the list and access it with indexes instead of allocating an empty list and using the append() method. This is because we avoid the burden of expanding the whole list if the space is not enough.

If I'm using python, probably performances are not that relevant in any case, but what is the better way of pre-allocating a list?

I know these possible candidates:

  • [None] * n → allocating two lists
  • [None for x in range(n)] — or xrange in python2 → building another object

Is one significantly better than the other?

What if we are in the case n = len(input)? Since input exists already, would [None for x in input] have better performances w.r.t. [None] * len(input)?

like image 869
Dacav Avatar asked Mar 06 '14 13:03

Dacav


People also ask

What is None]* n for in Python?

The None keyword is used to define a null value, or no value at all. None is not the same as 0, False, or an empty string. None is a data type of its own (NoneType) and only None can be None.

Should I preallocate array Python?

You don't need to preallocate anything. Behind the scenes, the list type will periodically allocate more space than it needs for its immediate use to amortize the cost of resizing the underlying array across multiple updates. For readability, consider if a list comprehension is suitable.

Is none same as empty list?

Usually, an empty list has a different meaning than None ; None means no value while an empty list means zero values.

How is memory allocated to a list in Python?

Python keeps a pointer to this array and the array's length is stored in a list head structure. This makes indexing of a list independent of the size of the list or the value of the index. When items are appended or inserted the array of references is resized.


2 Answers

In between those two options the first one is clearly better as no Python for loop is involved.

>>> %timeit [None] * 100
1000000 loops, best of 3: 469 ns per loop
>>> %timeit [None for x in range(100)] 
100000 loops, best of 3: 4.8 us per loop

Update:

And list.append has an O(1) complexity too, it might be a better choice than pre-creating list if you assign the list.append method to a variable.

>>> n = 10**3
>>> %%timeit
lis = [None]*n           
for _ in range(n):
    lis[_] = _
... 
10000 loops, best of 3: 73.2 us per loop
>>> %%timeit
lis = []                 
for _ in range(n):
    lis.append(_)
... 
10000 loops, best of 3: 92.2 us per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10000 loops, best of 3: 59.4 us per loop

>>> n = 10**6
>>> %%timeit
lis = [None]*n
for _ in range(n):
    lis[_] = _
... 
10 loops, best of 3: 106 ms per loop
>>> %%timeit
lis = []      
for _ in range(n):
    lis.append(_)
... 
10 loops, best of 3: 122 ms per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10 loops, best of 3: 91.8 ms per loop
like image 172
Ashwini Chaudhary Avatar answered Oct 22 '22 08:10

Ashwini Chaudhary


When you append an item to a list, Python 'over-allocates', see the source-code of the list object. This means that for example when adding 1 item to a list of 8 items, it actually makes room for 8 new items, and uses only the first one of those. The next 7 appends are then 'for free'.

In many languages (e.g. old versions of Matlab, the newer JIT might be better) you are always told that you need to pre-allocate your vectors, since appending during a loop is very expensive. In the worst case, appending of a single item to a list of length n can cost O(n) time, since you might have to create a bigger list and copy all the existing items over. If you need to do this on every iteration, the overall cost of adding n items is O(n^2), ouch. Python's pre-allocation scheme spreads the cost of growing the array over many single appends (see amortized costs), effectively making the cost of a single append O(1) and the overall cost of adding n items O(n).

Additionally, the overhead of the rest of your Python code is usually so large, that the tiny speedup that can be obtained by pre-allocating is insignificant. So in most cases, simply forget about pre-allocating, unless your profiler tells you that appending to a list is a bottleneck.

The other answers show some profiling of the list preallocation itself, but this is useless. The only thing that matters is profiling your complete code, with all your calculations inside your loop, with and without pre-allocation. If my prediction is right, the difference is so small that the computation time you win is dwarfed by the time spent thinking about, writing and maintaining the extra lines to pre-allocate your list.

like image 28
Bas Swinckels Avatar answered Oct 22 '22 07:10

Bas Swinckels