I have a list which I want to sort by multiple key
s, like:
L = [ ... ]
L.sort(key = lambda x: ( f(x), g(x) ))
This works fine. However, this results with unnecessary calls to g
, which I would like to avoid (for being potentially slow). In other words, I want to partially and lazily evaluate the key.
For example, if f
is unique over L
(i.e. len(L) == len(set(map(f,L)))
) no calls to g
should be made.
What would be the most elegant/pythonic way to do this?
One way I can think of is to define a custom cmp
function (L.sort(cmp=partial_cmp)
), but IMO this is less elegant and more complicated than using the key
parameter.
Another way would be to define a key-wrapper class which takes a generator expression to generate the different parts of the key, and override the comparison operators to compare one-by-one. However, I'm feeling there must be a simpler way...
EDIT: I'm interested in a solution for the general problem of sorting by multiple functions, not only two as in my example above.
You can try using itertools.groupby
:
result = []
for groupKey, group in groupby(sorted(L, key=f), key=f):
sublist = [y for y in group]
if len(sublist) > 1:
result += sorted(sublist, key=g)
else:
result += sublist
Another possibility, even less elegant, but in place:
L.sort(key = f)
start = None
end = None
for i,x in enumerate(L):
if start == None:
start = i
elif f(x) == f(L[start]):
end = i
elif end == None:
start = i
else:
L[start:end+1] = sorted(L[start:end+1], key=g)
start = None
if start != None and end != None:
L[start:end+1] = sorted(L[start:end+1], key=g)
First version generalized to any number of functions:
def sortBy(l, keyChain):
if not keyChain:
return l
result = []
f = keyChain[0]
for groupKey, group in groupby(sorted(l, key=f), key=f):
sublist = [y for y in group]
if len(sublist) > 1:
result += sortBy(sublist, keyChain[1:])
else:
result += sublist
return result
The second version generalized to any number of functions (not fully in place though):
def subSort(l, start, end, keyChain):
part = l[start:end+1]
sortBy(part, keyChain[1:])
l[start:end+1] = part
def sortBy(l, keyChain):
if not keyChain:
return
f = keyChain[0]
l.sort(key = f)
start = None
end = None
for i,x in enumerate(l):
if start == None:
start = i
elif f(x) == f(l[start]):
end = i
elif end == None:
start = i
else:
subSort(l, start, end, keyChain)
start = i
end = None
if start != None and end != None:
subSort(l, start, end, keyChain)
Given a function, you could create a LazyComparer class like this:
def lazy_func(func):
class LazyComparer(object):
def __init__(self, x):
self.x = x
def __lt__(self, other):
return func(self.x) < func(other.x)
def __eq__(self, other):
return func(self.x) == func(other.x)
return lambda x: LazyComparer(x)
To make a lazy key function out of multiple functions, you could create a utility function:
def make_lazy(*funcs):
def wrapper(x):
return [lazy_func(f)(x) for f in funcs]
return wrapper
And together they could be used like this:
def countcalls(f):
"Decorator that makes the function count calls to it."
def _f(*args, **kwargs):
_f._count += 1
return f(*args, **kwargs)
_f._count = 0
return _f
@countcalls
def g(x): return x
@countcalls
def f1(x): return 0
@countcalls
def f2(x): return x
def report_calls(*funcs):
print(' | '.join(['{} calls to {}'.format(f._count, f.func_name)
for f in funcs]))
L = range(10)[::-1]
L.sort(key=make_lazy(f1, g))
report_calls(f1, g)
g._count = 0
L.sort(key=make_lazy(f2, g))
report_calls(f2, g)
which yields
18 calls to f1 | 36 calls to g
36 calls to f2 | 0 calls to g
The @countcalls decorator above is used to connfirm that when f1
returns a lot
of ties, g
is called to break the ties, but when f2
returns distinct values,
g
does not get called.
NPE's solution adds memoization within the Key
class. With the solution above,
you could add memoization outside (independent of) the LazyComparer
class:
def memo(f):
# Author: Peter Norvig
"""Decorator that caches the return value for each call to f(args).
Then when called again with same args, we can just look it up."""
cache = {}
def _f(*args):
try:
return cache[args]
except KeyError:
cache[args] = result = f(*args)
return result
except TypeError:
# some element of args can't be a dict key
return f(*args)
_f.cache = cache
return _f
L.sort(key=make_lazy(memo(f1), memo(g)))
report_calls(f1, g)
which results in fewer calls to g
:
10 calls to f1 | 10 calls to g
You could use a key object that would lazily evaluate and cache g(x)
:
class Key(object):
def __init__(self, obj):
self.obj = obj
self.f = f(obj)
@property
def g(self):
if not hasattr(self, "_g"):
self._g = g(self.obj)
return self._g
def __cmp__(self, rhs):
return cmp(self.f, rhs.f) or cmp(self.g, rhs.g)
Here is an example of use:
def f(x):
f.count += 1
return x // 2
f.count = 0
def g(x):
g.count += 1
return x
g.count = 0
L = [1, 10, 2, 33, 45, 90, 3, 6, 1000, 1]
print sorted(L, key=Key)
print f.count, g.count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With