I sorted four similar lists. List d
consistently takes much longer than the others, which all take about the same time:
a: 33.5 ms
b: 33.4 ms
c: 36.4 ms
d: 110.9 ms
Why is that?
Test script (Attempt This Online!):
from timeit import repeat
n = 2_000_000
a = [i // 1 for i in range(n)] # [0, 1, 2, 3, ..., 1_999_999]
b = [i // 2 for i in range(n)] # [0, 0, 1, 1, 2, 2, ..., 999_999]
c = a[::-1] # [1_999_999, ..., 3, 2, 1, 0]
d = b[::-1] # [999_999, ..., 2, 2, 1, 1, 0, 0]
for name in 'abcd':
lst = globals()[name]
time = min(repeat(lambda: sorted(lst), number=1))
print(f'{name}: {time*1e3 :5.1f} ms')
As alluded to in the comments by btilly and Amadan, this is due to how the Timsort sorting algorithm works. Detailed description of the algorithm is here.
Timsort speeds up operation on partially sorted arrays by identifying runs of sorted elements.
A run is either "ascending", which means non-decreasing:
a0 <= a1 <= a2 <= ...
or "descending", which means strictly decreasing:
a0 > a1 > a2 > ...
Note that a run is always at least 2 long, unless we start at the array's last element.
Your arrays a, b and c each consist of just one run. The array d has 1 million runs.
The reason why the descending run cannot be >=
is to make the sort stable, i.e. keep the order of equal elements:
The definition of descending is strict, because the main routine reverses a descending run in-place, transforming a descending run into an ascending run. Reversal is done via the obvious fast "swap elements starting at each end, and converge at the middle" method, and that can violate stability if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements.
Python 3.11 has slightly improved version of timsort, sometimes called powersort, but it uses the same run detection and thus has the same performance.
As @jpa's answer explained, a
, b
and c
have a single "run" while d
has a million. We can also see the effect of that by counting the comparisons:
a: 1999999 comparisons, 1.000 per element
b: 1999999 comparisons, 1.000 per element
c: 1999999 comparisons, 1.000 per element
d: 10710650 comparisons, 5.355 per element
A single long run means we only compare each pair of neighbors once, that's all. For n elements, that's n-1 comparisons.
Since a
and b
are ascending, they're already sorted, and nothing further is done. Since c
is descending, it simply gets reversed, and then nothing further is done.
For d
, the two-element runs first get combined into runs of 32 to 64 elements using binary insertion sort. And then these longer runs get merged again and again into larger and larger runs until the whole list is sorted.
I tried various values of n. The number of comparisons for d
hovered around 5 per element, going up to ~5.3 until n reached a power of 2, after which the number of comparisons fell to ~4.7 again. In general, d
requires O(n) comparisons. It doesn't need n log n comparisons, because the merging uses "galloping" (an exponential search), which in this case needs only logarithmically many comparisons. So unlike ordinary merge sort in general cases, the recurrence relation for the number of comparisons here isn't T(n) = 2T(n/2) + n
but T(n) = 2T(n/2) + log(n)
, which is O(n).
The runtime however isn't O(n), as there are still n log n moves of elements. But that's low-level and very fast, relatively fast compared to the O(n) slower comparisons. If you measure time for various n, you might very well think it's O(n) time.
The script for counting (Attempt This Online!):
n = 2_000_000
a = [i // 1 for i in range(n)]
b = [i // 2 for i in range(n)]
c = a[::-1]
d = b[::-1]
class Int:
def __init__(self, value):
self.value = value
def __lt__(self, other):
global comparisons
comparisons += 1
return self.value < other.value
for name in 'abcd':
lst = globals()[name]
comparisons = 0
sorted(map(Int, lst))
print(f'{name}: {comparisons} comparisons, {comparisons/len(lst) :5.3f} per element')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With