Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this key class for sorting heterogeneous sequences behave oddly?

Python 3.x's sorted() function cannot be relied on to sort heterogeneous sequences, because most pairs of distinct types are unorderable (numeric types like int, float, decimal.Decimal etc. being an exception):

Python 3.4.2 (default, Oct  8 2014, 08:07:42) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: float() < str()

In contrast, comparisons between objects that have no natural order are arbitrary but consistent in Python 2.x, so sorted() works:

Python 2.7.8 (default, Aug  8 2014, 14:55:30) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
[-5, 2.3, 'four', 'one']

In order to replicate Python 2.x's behaviour in Python 3.x, I wrote a class to use as the key parameter to sorted(), which relies on the fact that sorted() is guaranteed to use only less-than comparisons:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        try:
            return self.value < other.value
        except TypeError:
            return repr(type(self.value)) < repr(type(other.value))

Example usage:

>>> sorted(["one", 2.3, "four", -5], key=motley)
[-5, 2.3, 'four', 'one']

So far, so good.

However, I've noticed a surprising behaviour when sorted(s, key=motley) is called with certain sequences containing complex numbers:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
[(1+0j), 0.0, False, (2+3j), 1]

I would have expected 0.0, False and 1 to be in one group (because they are mutually orderable), and (1+0j) and (2+3j) in another (because they are of the same type). The fact that the complex numbers in this result are not only separated from each other, but one of them is sitting in the middle of a group of objects that are comparable with each other but not with it, is somewhat perplexing.

What's going on here?

like image 224
Zero Piraeus Avatar asked Oct 25 '14 21:10

Zero Piraeus


1 Answers

You do not know what order the comparisons are done in, or even which items are compared, which means you can't really know what effect your __lt__ will have. Your defined __lt__ sometimes depends on the actual values, and sometimes on the string representations of the types, but both versions may be used for the same object in the course of the sort. This means that your ordering is not determined solely by the objects in the list, but also may depend on their initial order. This in turn means that just because objects are mutually comparable does not mean they will be sorted together; they may be "blocked" by an incomparable object between them.

You can get an inkling of what is going on by putting some debugging prints in to see what it's comparing:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        fallback = False
        try:
            result = self.value < other.value
        except TypeError:
            fallback = True
            result = repr(type(self.value)) < repr(type(other.value))
        symbol = "<" if result else ">"
        print(self.value, symbol, other.value, end="")
        if fallback:
            print(" -- because", repr(type(self.value)), symbol, repr(type(other.value)))
        else:
            print()
        return result

Then:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
1 > 0.0
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 0.0 -- because <class 'complex'> < <class 'float'>
False > 0.0
False < 1
(2+3j) > False -- because <class 'complex'> > <class 'bool'>
(2+3j) < 1 -- because <class 'complex'> < <class 'int'>
[(1+0j), 0.0, False, (2+3j), 1]

You can see, for instance, that the type-based ordering is used for comparing the complex number to 1, but not for comparing 1 and 0. Likewise 0.0 < False for "normal" reasons, but 2+3j > False for type-name-based reasons.

The result is that it sorts 1+0j to the beginning, and then leaves 2+3j where it is above False. It never even attempts to compare the two complex numbers to each other, and the only item they are both compared to is 1.

More generally, your approach can lead to an intransitive ordering with appropriate choices for the names of the types involved. For instance, if you define classes A, B, and C, such that A and C can be compared, but they raise exceptions when comparing to B, then by creating objects a, b and c (from the respective classes) such that c < a, you can create a cycle a < b < c < a. a < b < c will be true because the classes will be compared based on their names, but c < a since these types can be directly compared. With an intransitive ordering, there is no hope of a "correct" sorted order.

You can even do this with builtin types, although it requires getting a little creative to think of objects whose type names are in the right alphabetical sequence:

>>> motley(1.0) < motley(lambda: 1) < motley(0) < motley(1.0)
True

(Because 'float' < 'function' :-)

like image 176
BrenBarn Avatar answered Nov 15 '22 13:11

BrenBarn