Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a sorted list bigger than an unsorted list

Tags:

I have a list my_list (the list contains utf8 strings):

>>> len(my_list) 8777 >>> getsizeof(my_list)                     #  <-- note the size 77848 

For some reason, a sorted list (my_sorted_list = sorted(my_list)) uses more memory :

>>> len(my_sorted_list) 8777 >>> getsizeof(my_sorted_list)              #  <-- note the size 79104 

Why is sorted returning a list that takes more space in memory than the initial unsorted list?

like image 358
jcuenod Avatar asked Oct 29 '16 03:10

jcuenod


People also ask

What does it mean when a list is sorted?

(data structure) Definition: A list whose items are kept sorted. See also sorted array, ordered linked list. Note: Often the term "sorted list" is short for "ordered linked list".

Does sorted change the list?

No, sorted creates a new list.

Does sorted () modify the original list?

The sort() method sorts the original list in place. It means that the sort() method modifies the order of elements in the list. By default, the sort() method sorts the elements of a list using the less-than operator ( < ).

Is sort faster than sorted?

sort is slightly faster than sorted and consumes around 24% less memory. However, keep in mind that list. sort is only implemented for lists, whereas sorted accepts any iterable.


2 Answers

As Ignacio points out, this is due to Python allocating a bit more memory than required. This is done in order to perform O(1) .appends on lists.

sorted creates a new list out of the sequence provided, sorts it in place and returns it. To create the new list, Python extends an empty sized list with the one passed; that results in the observed over-allocation (which happens after calling list_resize). You can corroborate the fact that sorting isn't the culprit by using list.sort; the same algorithm is used without a new list getting created (or, as it's known, it's performed in-place). The sizes there, of course, don't differ.

It is worth noting that this difference is mostly present when:

  • The original list was created with a list-comp (where, if space is available and the final append doesn't trigger a resize, the size is smaller).

  • When list literals are used. There a PyList_New is created based on the number of values on the stack and no appends are made. Direct assigning to the underlying array is performed) which doesn't trigger any resizes and keeps size to its minimum:

So, with a list-comp:

l = [i for i in range(10)]  getsizeof(l)          # 192 getsizeof(sorted(l))  # 200 

Or a list literal:

l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  getsizeof(l)          # 144 getsizeof(sorted(l))  # 200 

the sizes are smaller (more so with use of literals).

When creating through list, the memory is always over-allocated; Python knows the sizes and preempts future modifications by over-allocating a bit based on the size:

l = list(range(10))  getsizeof(l)          # 200 getsizeof(sorted(l))  # 200 

So you get no observed difference in the sizes of the list(s).


As a final note, I must point out that this is behavior specific the C implementation of Python i.e CPython. It's a detail of how the language was implemented and as such, you shouldn't depend on it in any wacky way.

Jython, IronPython, PyPy and any other implementation might/might not have the same behavior.

like image 109
Dimitris Fasarakis Hilliard Avatar answered Oct 27 '22 09:10

Dimitris Fasarakis Hilliard


The list resize operation overallocates in order to amortize appending to the list versus starting with a list preallocated by the compiler.

like image 43
Ignacio Vazquez-Abrams Avatar answered Oct 27 '22 09:10

Ignacio Vazquez-Abrams