Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to find unique unhashable unorderable types in Python 3

So in Python 2 you could use something like

>>> items = [[1, 2], [3], [3], 4, 'a', 'b', 'a']
>>> from itertools import groupby
>>> [k for k, g in groupby(sorted(items))]
[4, [1, 2], [3], 'a', 'b']

Which works well, in O(N log N) time. However Python 3 exclaims TypeError: unorderable types: int() < list(). So What's the best way to do it in Python 3? (I know best is a subjective term but really there should be one way of doing it according to Python)

EDIT: It doesn't have to use a sort, but I'm guessing that would be the best way

like image 312
user2282357 Avatar asked Apr 15 '13 11:04

user2282357


1 Answers

In 2.x, values of two incomparable built-in types are ordered by type. The order of the types is not defined, except that it will be consistent during one run of the interpreter. So, 2 < [2] may be true or false, but it will be consistently true or false.

In 3.x, values of incomparable built-in types are incomparable—meaning they raise a TypeError if you try to compare them. So, 2 < [2] is an error. And, at least as of 3.3, the types themselves aren't even comparable. But if all you want to reproduce is the 2.x behavior, their ids are definitely comparable, and consistent during a run of the interpreter. So:

sorted(items, key=lambda x: (id(type(x)), x))

For your use case, that's all you need.


However, this won't be exactly the same thing that 2.x does, because it means that, for example, 1.5 < 2 may be False (because float > int). If you want to duplicate the exact behavior, you need to write a key function that first tries to compare the values, then on TypeError falls back to comparing the types.

This is one of the few cases where an old-style cmp function is a lot easier to read than a new-style key function, so let's write one of those, then use cmp_to_key on it:

def cmp2x(a, b):
    try:
        if a==b: return 0
        elif a<b: return -1
        elif b<a: return 1
    except TypeError:
        pass
    return cmp2x(id(type(a)), id(type(b)))
sorted(items, key=functools.cmp_to_key(cmp2x))

This still doesn't guarantee the same order between two values of different types that 2.x would give, but since 2.x didn't define any such order (just that it's consistent within one run), there's no way it could.

There is still one real flaw, however: If you define a class whose objects are not fully-ordered, they will end up sorting as equal, and I'm not sure this is the same thing 2.x would do in that case.

like image 192
abarnert Avatar answered Nov 14 '22 20:11

abarnert