I am looking for the Python equivalent of the C++ vector::reserve(). I don't know how big the list is going to be ahead of time, but I know it's going to be fairly large, and I want to avoid as many resizings as possible as the list is being grown inside a deep inner loop.
The only solution I've come up with so far is very cumbersome compared to the vector::reserve() idiom. That solution is to pre-create the list using [None]*K, tracking the size of the list in a separate counter, appending or setting items into the list as need be, then copying a slice of the list once it's fully constructed. Is there any alternative?
Similarly to std::vector
, CPython's list
already preallocates more elements than is required and then grows that allocated space in a manner that gives O(1)
amortized appends. I would therefore leave it at that until I could prove through profiling this is indeed a bottleneck.
edit: You mention in the comments that you've already done the profiling. In such a case, pre-allocating [None]*n
might be a sensible thing to try to see if it's really the repeated reallocations that are the bottleneck.
If your array is numeric, I would encourage you to take a look at NumPy.
For the heck of it, I did some performance testing:
def foo(n):
x = []
for y in xrange(n): x.append(y)
def bar(n):
x = [None] * n
for y in xrange(n): x[y] = y
def baz(n):
# This is obviously silly; we could just do range(n)
# but this way makes for a fairer test
x = [y for y in xrange(n)]
>>> timeit.timeit(lambda:foo(1000000), number=10)
1.761765391970215
>>> timeit.timeit(lambda:bar(1000000), number=10)
0.79829286962450396
>>> timeit.timeit(lambda:baz(1000000), number=10)
0.9904259479906159
>>> timeit.timeit(lambda:foo(10000), number=1000)
1.3354106457664443
>>> timeit.timeit(lambda:bar(10000), number=1000)
0.70596751821813086
>>> timeit.timeit(lambda:baz(10000), number=1000)
0.58049759117432131
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With