I'd like to sort a list and observe/visualize how Python's sorting algorithm Timsort is moving the elements around.
First attempt: A subclass of list
which prints itself after each change:
class List(list):
def __setitem__(self, index, value):
list.__setitem__(self, index, value)
print(self)
That works when I change elements myself, but during sort
... nothing:
>>> a = List([None] * 2)
>>> a[0] = 'S'
['S', None]
>>> a[1] = 'P'
['S', 'P']
>>> a.sort()
>>>
Second attempt: Print the list (in global variable a
) at each comparison of elements:
class Str(str):
def __lt__(self, other):
print(a)
return other > self
That does something, but the list is always... empty?
>>> a = list(map(Str, 'test'))
>>> a.sort()
[]
[]
[]
[]
[]
[]
>>>
Why do these attempts fail, and is there a way to observe what Timsort is doing?
Yes, it can be observed. I did, here's a visualization:
And another one with more elements:
Why do your attempts fail?
list
subclass fails because sort
works "inside" the list at C code level, not going through the public Python __setitem__
interface./* The list is temporarily made empty, so that mutations performed * by comparison functions can't affect the slice of memory we're * sorting (allowing mutations during sorting is a core-dump * factory, since ob_item may change).
But then how can we observe the list during the sorting, if it's empty? Easy: We don't. Not during the sorting. But after partial sortings.
Timsort is:
/* [...] Even in case of error, the * list will be some permutation of its input state (nothing is lost or * duplicated).
So the idea is:
Code doing that (except it doesn't show duplicate states, which happens when Timsort makes several comparisons before the next move):
import string
from random import shuffle
from itertools import count
class Str(str):
def __lt__(self, other):
global allowed_comparisons
if allowed_comparisons == 0:
raise Exception
allowed_comparisons -= 1
return other > self
a = list(string.ascii_letters + string.digits + '<>')
shuffle(a)
shown = None
for allowed_comparisons in count():
try:
b = list(map(Str, a))
b.sort()
except:
pass
state = ''.join(b)
if state != shown:
print(state)
shown = state
if list(state) == sorted(state):
break
Output, discussed below:
k1z5Qi>jCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1kz5Qi>jCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15kzQi>jCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15Qkzi>jCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15Qikz>jCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>QikzjCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>QijkzCVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQijkzVsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQVijkzsfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQVijkszfRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQVfijkszRbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQRVfijkszbBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>CQRVbfijkszBWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCQRVbfijkszWSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCQRVWbfijkszSnJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCQRSVWbfijksznJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCQRSVWbfijknszJOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCJQRSVWbfijknszOD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCJOQRSVWbfijknszD6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
15>BCDJOQRSVWbfijknsz6AZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
156>BCDJOQRSVWbfijknszAZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
156>ABCDJOQRSVWbfijknszZ3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
156>ABCDJOQRSVWZbfijknsz3cGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDJOQRSVWZbfijknszcGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDJOQRSVWZbcfijknszGTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGJOQRSVWZbcfijknszTaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGJOQRSTVWZbcfijknszaIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGJOQRSTVWZabcfijknszIrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknszrvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrszvw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvzw0pePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0wpePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0pwePH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0epwPH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0PepwH94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0HPepw94UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz09HPepw4UgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049HPepwUgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049HPUepwgqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049HPUegpwqEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049HPUegpqwEK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049EHPUegpqwK2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz049EHKPUegpqw2FNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0249EHKPUegpqwFNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0249EFHKPUegpqwNYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0249EFHKNPUegpqwYL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0249EFHKNPUYegpqwL7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz0249EFHKLNPUYegpqw7Xdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUYegpqwXdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUXYegpqwdltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUXYdegpqwltoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUXYdeglpqwtoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUXYdeglpqtwoxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz02479EFHKLNPUXYdeglopqtwxy8uM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789EFHKLNPUXYdeglopqtwxyuM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789EFHKLNPUXYdeglopqtuwxyM<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789EFHKLMNPUXYdeglopqtuwxy<hm
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789<EFHKLMNPUXYdeglopqtuwxyhm
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789<EFHKLMNPUXYdeghlopqtuwxym
1356>ABCDGIJOQRSTVWZabcfijknrsvz024789<EFHKLMNPUXYdeghlmopqtuwxy
01356>ABCDGIJOQRSTVWZabcfijknrsvz24789<EFHKLMNPUXYdeghlmopqtuwxy
012356>ABCDGIJOQRSTVWZabcfijknrsvz4789<EFHKLMNPUXYdeghlmopqtuwxy
0123456>ABCDGIJOQRSTVWZabcfijknrsvz789<EFHKLMNPUXYdeghlmopqtuwxy
01234567>ABCDGIJOQRSTVWZabcfijknrsvz89<EFHKLMNPUXYdeghlmopqtuwxy
012345678>ABCDGIJOQRSTVWZabcfijknrsvz9<EFHKLMNPUXYdeghlmopqtuwxy
0123456789>ABCDGIJOQRSTVWZabcfijknrsvz<EFHKLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDGIJOQRSTVWZabcfijknrsvzEFHKLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEGIJOQRSTVWZabcfijknrsvzFHKLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGIJOQRSTVWZabcfijknrsvzHKLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJOQRSTVWZabcfijknrsvzKLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKOQRSTVWZabcfijknrsvzLMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLOQRSTVWZabcfijknrsvzMNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMOQRSTVWZabcfijknrsvzNPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOQRSTVWZabcfijknrsvzPUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTVWZabcfijknrsvzUXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWZabcfijknrsvzXYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXZabcfijknrsvzYdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcfijknrsvzdeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdfijknrsvzeghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefijknrsvzghlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefgijknrsvzhlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijknrsvzlmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklnrsvzmopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnrsvzopqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnorsvzpqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoprsvzqtuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrsvztuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstvzuwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvzwxy
0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
In the above output, note there are three stages:
1356>ABCDGIJOQRSTVWZabcfijknrsvz
. Timsort does that with binary insertion sort. Each line in the output corresponds to one insertion.024789<EFHKLMNPUXYdeghlmopqtuwxy
. Again with binary insertion sort.1356>ABCDGIJOQRSTVWZabcfijknrsvz024789<EFHKLMNPUXYdeghlmopqtuwxy
01356>ABCDGIJOQRSTVWZabcfijknrsvz24789<EFHKLMNPUXYdeghlmopqtuwxy
0
from the second half gets moved to the front, the list looks like this:0--------------------------------24789<EFHKLMNPUXYdeghlmopqtuwxy
0
to the left and shifting the entire first half to the right by one index. That would be inefficient, and it's not what happens when Timsort runs normally, without my exceptions.You can also see these three stages in my animated image at the top. And in this "screensaver" version, where I scroll down through the above output. I think it looks cool (was hoping it would look somewhat like Matrix code rain) but I find it less clear:
Note that here the third stage merges from right to left (with the right half moved out to temp space), which Timsort does when that's better.
Code for generating that image using Pillow, after storing the states in list states
instead of printing them:
from PIL import Image, ImageDraw
images = []
for i in range(len(states) - 14):
img = Image.new('RGB', (384, 225), (0, 0, 0))
text = '\n'.join(states[i:i+15])
d = ImageDraw.Draw(img)
d.multiline_text((0,0), text, fill=(0, 200, 0))
img = img.resize((384*2, 225*2), 0)
images.append(img)
images[0].save('timatrix.png', save_all=True, append_images=images[1:],
optimize=False, duration=100, loop=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With