How can I get 2.x-like sorting behaviour in Python 3.x?

Tags:

I'm trying to replicate (and if possible improve on) Python 2.x's sorting behaviour in 3.x, so that mutually orderable types like int, float etc. are sorted as expected, and mutually unorderable types are grouped within the output.

Here's an example of what I'm talking about:

>>> sorted([0, 'one', 2.3, 'four', -5])  # Python 2.x [-5, 0, 2.3, 'four', 'one']

>>> sorted([0, 'one', 2.3, 'four', -5])  # Python 3.x Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: unorderable types: str() < int()

My previous attempt at this, using a class for the key parameter to sorted() (see Why does this key class for sorting heterogeneous sequences behave oddly?) is fundamentally broken, because its approach of

Trying to compare values, and
If that fails, falling back to comparing the string representation of their types

can lead to intransitive ordering, as explained by BrenBarn's excellent answer.

A naïve approach, which I initially rejected without even trying to code it, would be to use a key function that returns a (type, value) tuple:

def motley(value):     return repr(type(value)), value

However, this doesn't do what I want. In the first place, it breaks the natural ordering of mutually orderable types:

>>> sorted([0, 123.4, 5, -6, 7.89]) [-6, 0, 5, 7.89, 123.4] >>> sorted([0, 123.4, 5, -6, 7.89], key=motley) [7.89, 123.4, -6, 0, 5]

Secondly, it raises an exception when the input contains two objects of the same intrinsically unorderable type:

>>> sorted([{1:2}, {3:4}], key=motley) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: unorderable types: dict() < dict()

... which admittedly is the standard behaviour in both Python 2.x and 3.x – but ideally I'd like such types to be grouped together (I don't especially care about their ordering, but it would seem in keeping with Python's guarantee of stable sorting that they retain their original order).

I can work around the first of these problems for numeric types by special-casing them:

from numbers import Real from decimal import Decimal  def motley(value):     numeric = Real, Decimal     if isinstance(value, numeric):         typeinfo = numeric     else:         typeinfo = type(value)     return repr(typeinfo), value

... which works as far as it goes:

>>> sorted([0, 'one', 2.3, 'four', -5], key=motley) [-5, 0, 2.3, 'four', 'one']

... but doesn't account for the fact that there may be other distinct (possibly user-defined) types which are mutually orderable, and of course still fails with intrinsically unorderable types:

>>> sorted([{1:2}, {3:4}], key=motley) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: unorderable types: dict() < dict()

Is there another approach which solves both the problem of arbitrary, distinct-but-mutually-orderable types and that of intrinsically unorderable types?

690

asked Oct 26 '14 16:10

Zero Piraeus

2 Answers

Stupid idea: make a first pass to divide all the different items in groups that can be compared between each other, sort the individual groups and finally concatenate them. I assume that an item is comparable to all members of a group, if it is comparable with the first member of a group. Something like this (Python3):

import itertools  def python2sort(x):     it = iter(x)     groups = [[next(it)]]     for item in it:         for group in groups:             try:                 item < group[0]  # exception if not comparable                 group.append(item)                 break             except TypeError:                 continue         else:  # did not break, make new group             groups.append([item])     print(groups)  # for debugging     return itertools.chain.from_iterable(sorted(group) for group in groups)

This will have quadratic running time in the pathetic case that none of the items are comparable, but I guess the only way to know that for sure is to check all possible combinations. See the quadratic behavior as a deserved punishment for anyone trying to sort a long list of unsortable items, like complex numbers. In a more common case of a mix of some strings and some integers, the speed should be similar to the speed of a normal sort. Quick test:

In [19]: x = [0, 'one', 2.3, 'four', -5, 1j, 2j,  -5.5, 13 , 15.3, 'aa', 'zz']  In [20]: list(python2sort(x)) [[0, 2.3, -5, -5.5, 13, 15.3], ['one', 'four', 'aa', 'zz'], [1j], [2j]] Out[20]: [-5.5, -5, 0, 2.3, 13, 15.3, 'aa', 'four', 'one', 'zz', 1j, 2j]

It seems to be a 'stable sort' as well, since the groups are formed in the order the incomparable items are encountered.

160

answered Oct 01 '22 12:10

Bas Swinckels

This answer aims to faithfully re-create the Python 2 sort order, in Python 3, in every detail.

The actual Python 2 implementation is quite involved, but object.c's default_3way_compare does the final fallback after instances have been given a chance to implement normal comparison rules. This is after individual types have been given a chance to compare (via the __cmp__ or __lt__ hooks).

Implementing that function as pure Python in a wrapper, plus emulating the exceptions to the rules (dict and complex numbers specifically) gives us the same Python 2 sorting semantics in Python 3:

from numbers import Number   # decorator for type to function mapping special cases def per_type_cmp(type_):     try:         mapping = per_type_cmp.mapping     except AttributeError:         mapping = per_type_cmp.mapping = {}     def decorator(cmpfunc):         mapping[type_] = cmpfunc         return cmpfunc     return decorator   class python2_sort_key(object):     _unhandled_types = {complex}      def __init__(self, ob):        self._ob = ob      def __lt__(self, other):         _unhandled_types = self._unhandled_types         self, other = self._ob, other._ob  # we don't care about the wrapper          # default_3way_compare is used only if direct comparison failed         try:             return self < other         except TypeError:             pass          # hooks to implement special casing for types, dict in Py2 has         # a dedicated __cmp__ method that is gone in Py3 for example.         for type_, special_cmp in per_type_cmp.mapping.items():             if isinstance(self, type_) and isinstance(other, type_):                 return special_cmp(self, other)          # explicitly raise again for types that won't sort in Python 2 either         if type(self) in _unhandled_types:             raise TypeError('no ordering relation is defined for {}'.format(                 type(self).__name__))         if type(other) in _unhandled_types:             raise TypeError('no ordering relation is defined for {}'.format(                 type(other).__name__))          # default_3way_compare from Python 2 as Python code         # same type but no ordering defined, go by id         if type(self) is type(other):             return id(self) < id(other)          # None always comes first         if self is None:             return True         if other is None:             return False          # Sort by typename, but numbers are sorted before other types         self_tname = '' if isinstance(self, Number) else type(self).__name__         other_tname = '' if isinstance(other, Number) else type(other).__name__          if self_tname != other_tname:             return self_tname < other_tname          # same typename, or both numbers, but different type objects, order         # by the id of the type object         return id(type(self)) < id(type(other))   @per_type_cmp(dict) def dict_cmp(a, b, _s=object()):     if len(a) != len(b):         return len(a) < len(b)     adiff = min((k for k in a if a[k] != b.get(k, _s)), key=python2_sort_key, default=_s)     if adiff is _s:         # All keys in a have a matching value in b, so the dicts are equal         return False     bdiff = min((k for k in b if b[k] != a.get(k, _s)), key=python2_sort_key)     if adiff != bdiff:         return python2_sort_key(adiff) < python2_sort_key(bdiff)     return python2_sort_key(a[adiff]) < python2_sort_key(b[bdiff])

I incorporated handling dictionary sorting as implemented in Python 2, since that'd be supported by the type itself via a __cmp__ hook. I've stuck to the Python 2 ordering for the keys and values as well, naturally.

I've also added special casing for complex numbers, as Python 2 raises an exception when you try sort to these:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)]) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers

You may have to add more special cases if you want to emulate Python 2 behaviour exactly.

If you wanted to sort complex numbers anyway you'll need to consistently put them with the non-numbers group; e.g.:

# Sort by typename, but numbers are sorted before other types if isinstance(self, Number) and not isinstance(self, complex):     self_tname = '' else:     self_tname = type(self).__name__ if isinstance(other, Number) and not isinstance(other, complex):     other_tname = '' else:     other_tname = type(other).__name__

Some test cases:

>>> sorted([0, 'one', 2.3, 'four', -5], key=python2_sort_key) [-5, 0, 2.3, 'four', 'one'] >>> sorted([0, 123.4, 5, -6, 7.89], key=python2_sort_key) [-6, 0, 5, 7.89, 123.4] >>> sorted([{1:2}, {3:4}], key=python2_sort_key) [{1: 2}, {3: 4}] >>> sorted([{1:2}, None, {3:4}], key=python2_sort_key) [None, {1: 2}, {3: 4}]

answered Oct 01 '22 13:10

Martijn Pieters

Related questions
                            
                                How to specify version ranges in install_requires (setuptools, distribute)
                            
                                How to use Win32 API with Python?
                            
                                Django ModelForm with extra fields that are not in the model
                            
                                SQLAlchemy, clear database content but don't drop the schema
                            
                                Python POST binary data
                            
                                How do you resolve Python package dependencies with pipenv?
                            
                                How to prevent "too broad exception" in this case?
                            
                                Using IntellijIdea within an existing virtualenv
                            
                                Filter Pandas DataFrame by time index
                            
                                Meaning of leading underscore in list of tuples used to define choice fields?
                            
                                How to assign a NULL value to a pointer in python?
                            
                                How does one ignore unexpected keyword arguments passed to a function?
                            
                                How to generate a random UUID which is reproducible (with a seed) in Python
                            
                                Reserve memory for list in Python?
                            
                                constructing absolute path with os.path.join()
                            
                                typing.Any vs object?
                            
                                How to get row number in dataframe in Pandas?
                            
                                Using an attribute of the current class instance as a default value for method's parameter [duplicate]
                            
                                How to inject variable into scope with a decorator?
                            
                                Python 3.x BaseHTTPServer or http.server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I get 2.x-like sorting behaviour in Python 3.x?

Tags:

python

sorting

python-3.x

python-2.x

Zero Piraeus

People also ask

2 Answers

Bas Swinckels

Martijn Pieters

Recent Activity

Donate For Us