I have a list my_list
(the list contains utf8 strings):
>>> len(my_list) 8777 >>> getsizeof(my_list) # <-- note the size 77848
For some reason, a sorted list (my_sorted_list = sorted(my_list)
) uses more memory :
>>> len(my_sorted_list) 8777 >>> getsizeof(my_sorted_list) # <-- note the size 79104
Why is sorted
returning a list that takes more space in memory than the initial unsorted list?
(data structure) Definition: A list whose items are kept sorted. See also sorted array, ordered linked list. Note: Often the term "sorted list" is short for "ordered linked list".
No, sorted creates a new list.
The sort() method sorts the original list in place. It means that the sort() method modifies the order of elements in the list. By default, the sort() method sorts the elements of a list using the less-than operator ( < ).
sort is slightly faster than sorted and consumes around 24% less memory. However, keep in mind that list. sort is only implemented for lists, whereas sorted accepts any iterable.
As Ignacio points out, this is due to Python allocating a bit more memory than required. This is done in order to perform O(1)
.appends
on lists.
sorted
creates a new list out of the sequence provided, sorts it in place and returns it. To create the new list, Python extends an empty sized list with the one passed; that results in the observed over-allocation (which happens after calling list_resize
). You can corroborate the fact that sorting isn't the culprit by using list.sort
; the same algorithm is used without a new list getting created (or, as it's known, it's performed in-place). The sizes there, of course, don't differ.
It is worth noting that this difference is mostly present when:
The original list was created with a list-comp (where, if space is available and the final append
doesn't trigger a resize, the size is smaller).
When list literals are used. There a PyList_New
is created based on the number of values on the stack and no appends are made. Direct assigning to the underlying array is performed) which doesn't trigger any resizes and keeps size to its minimum:
So, with a list-comp:
l = [i for i in range(10)] getsizeof(l) # 192 getsizeof(sorted(l)) # 200
Or a list literal:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] getsizeof(l) # 144 getsizeof(sorted(l)) # 200
the sizes are smaller (more so with use of literals).
When creating through list
, the memory is always over-allocated; Python knows the sizes and preempts future modifications by over-allocating a bit based on the size:
l = list(range(10)) getsizeof(l) # 200 getsizeof(sorted(l)) # 200
So you get no observed difference in the sizes of the list(s).
As a final note, I must point out that this is behavior specific the C
implementation of Python i.e CPython. It's a detail of how the language was implemented and as such, you shouldn't depend on it in any wacky way.
Jython, IronPython, PyPy and any other implementation might/might not have the same behavior.
The list resize operation overallocates in order to amortize appending to the list versus starting with a list preallocated by the compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With