What is the precise rule used in Python in order to sort lists where the elements are lists? Can this be expressed as a 'key' or 'cmp' function? The problems comes from the fact that there are two things to consider: length and values at their position.
sorted([
[ 0, 1, 2, 3 ], # 1st line: longer list
[ 0, 1 ], # 2nd line: shorter list
[ 0, 2 ] # 3rd line: suspected last
])
Is it safe to assume that the second line will sort before the first? Is it safe to assume that the third line will sort always last?
Note, this is not about stability! The specific case above behaves that way as described. But, can the rules there be considered to be general? What are the precise rules which python applies here?
Relying on the following definition Lexicographical Order (Thanks to Ashniwi):
To compare sequences of different lengths, the shorter sequence is usually padded at the end with enough "blanks" (a special symbol that is treated as smaller than every element of A). This way of comparing sequences of different lengths is always used in dictionaries. However, in combinatorics, another convention is frequently used, whereby a shorter sequence is always smaller than a longer sequence. This variant of the lexicographical order is sometimes called shortlex order.
Is Python using 'shortlex order'. Where is proof for that assumption, beyond practical examples?
By default, sorted
uses the __lt__
method of the items that are compared. And lists with comparable elements are compared lexicographically, according to the Python documentation. So yes, the language guarantees that in the shorter string will be sorted before the longer string.
Quoting from docs:
In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.
Lexicographical comparison between built-in collections works as follows:
[1,2] == (1,2)
is false because the type is not the same).[1,2,x] <= [1,2,y]
has the same value as x <= y
). If a corresponding element does not exist, the shorter collection is ordered first (for example, [1,2] < [1,2,3]
is true).The basic comparison done for lists can be expressed using this function:
def cmp(list_1, list_2):
length_1 = len(list_1)
length_2 = len(list_2)
min_length = min(length_1, length_2)
# Compare individual items till there's a different item found
for i in xrange(min_length):
if list_1[i] > list_2[i]:
return 1
elif list_1[i] < list_2[i]:
return -1
# All items were same so far, let's compare sizes.
if length_1 > length_2:
return 1
elif length_1 < length_2:
return -1
elif length_1 == length_2:
return 0
Demo:
>>> lst = [[ 0, 1, 2, 3 ], [ 0, 1 ], [ 0, 2 ]]
>>> print sorted(lst) == sorted(lst, cmp=cmp)
True
Related CPython code for reference:
/* Search for the first index where items are different */
for (i = 0; i < Py_SIZE(vl) && i < Py_SIZE(wl); i++) {
int k = PyObject_RichCompareBool(vl->ob_item[i],
wl->ob_item[i], Py_EQ);
if (k < 0)
return NULL;
if (!k)
break;
}
if (i >= Py_SIZE(vl) || i >= Py_SIZE(wl)) {
/* No more items to compare -- compare sizes */
Py_ssize_t vs = Py_SIZE(vl);
Py_ssize_t ws = Py_SIZE(wl);
int cmp;
PyObject *res;
switch (op) {
case Py_LT: cmp = vs < ws; break;
case Py_LE: cmp = vs <= ws; break;
case Py_EQ: cmp = vs == ws; break;
case Py_NE: cmp = vs != ws; break;
case Py_GT: cmp = vs > ws; break;
case Py_GE: cmp = vs >= ws; break;
default: return NULL; /* cannot happen */
}
if (cmp)
res = Py_True;
else
res = Py_False;
Py_INCREF(res);
return res;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With