Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the resizing factor of lists in Python

For example, ArrayLists in Java have a resizing factor of 2. When the array that the ArrayList is wrapped around runs out of space, all the elements of that array are transferred to a new array that is 2 times the size of the original array.

Since Python lists/arrays are naturally dynamic, what are their resizing factor? Or do they use some other method of scaling? If so, what's that method? What's its asymptotic runtime (big O)?

like image 836
TomLisankie Avatar asked Sep 06 '18 02:09

TomLisankie


2 Answers

For CPython, in particular, the overallocation during resize is implemented in objects/listobject.c. It is described as

mild, but ... enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc().

When the list needs to be grown, it is grown to be approximately 112.5% of the necessary size. In particular, the actual size is new_size + new_size >> 3 + (newsize < 9 ? 3 : 6).

like image 96
nanofarad Avatar answered Nov 10 '22 08:11

nanofarad


To answer your other question: adding an item to the end of an ArrayList or similar takes amortized O(1) time as long as there is any factor k > 1 such that reallocation always makes an array that is k times as big as the previous one.

The actual factor used doesn't really matter as long as it's bigger than one. Alternative reallocation schemes might have a different complexity. If reallocation adds a constant amount, for example, then adding an item takes amortized O(N) time.

That's why all professionally written dynamically growing arrays grow by some factor each time. The factor provides a fixed upper bound on the proportion of wasted space ((k-1)/k), traded off against a fixed upper bound on the average number of times each item could be copied in a sequence of adds.

Lets say you're using factor k and you've just performed the Nth reallocation. That means you have added about k^N items. How many items have you copied? Let the be C. Then:

C = k^(N-1) + k(N-2) + ... + 1

kC - C = k^N + k^(N-1) - k^(N-1) + k^(N-2) - k^(N-2) ... - 1

kC - C = k^N - 1

C = (k^N-1) / (k-1)

The number of copies per item added, then is C/K^N, which is pretty much 1/(k-1).

So Java's factor of k=2 means there is about one copy per add operation, with up to 50% unused slots, and CPython's factor of k=1.125 means there are about 8 copies per add operation with up to 11% unused slots.

like image 29
Matt Timmermans Avatar answered Nov 10 '22 09:11

Matt Timmermans