I am looking at some code that has a lot of sort calls using comparison functions, and it seems like it should be using key functions.
If you were to change seq.sort(lambda x,y: cmp(x.xxx, y.xxx))
, which is preferable:
seq.sort(key=operator.attrgetter('xxx'))
or:
seq.sort(key=lambda a:a.xxx)
I would also be interested in comments on the merits of making changes to existing code that works.
attrgetter (attr) operator. attrgetter (*attrs) Return a callable object that fetches attr from its operand. If more than one attribute is requested, returns a tuple of attributes. The attribute names can also contain dots.
When choosing purely between attrgetter('attributename')
and lambda o: o.attributename
as a sort key, then using attrgetter()
is the faster option of the two.
Remember that the key function is only applied once to each element in the list, before sorting, so to compare the two we can use them directly in a time trial:
>>> from timeit import Timer >>> from random import randint >>> from dataclasses import dataclass, field >>> @dataclass ... class Foo: ... bar: int = field(default_factory=lambda: randint(1, 10**6)) ... >>> testdata = [Foo() for _ in range(1000)] >>> def test_function(objects, key): ... [key(o) for o in objects] ... >>> stmt = 't(testdata, key)' >>> setup = 'from __main__ import test_function as t, testdata; ' >>> tests = { ... 'lambda': setup + 'key=lambda o: o.bar', ... 'attrgetter': setup + 'from operator import attrgetter; key=attrgetter("bar")' ... } >>> for name, tsetup in tests.items(): ... count, total = Timer(stmt, tsetup).autorange() ... print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)") ... lambda: 130.495 microseconds (2000 repetitions) attrgetter: 92.850 microseconds (5000 repetitions)
So applying attrgetter('bar')
1000 times is roughly 40 μs faster than a lambda
. That's because calling a Python function has a certain amount of overhead, more than calling into a native function such as produced by attrgetter()
.
This speed advantage translates into faster sorting too:
>>> def test_function(objects, key): ... sorted(objects, key=key) ... >>> for name, tsetup in tests.items(): ... count, total = Timer(stmt, tsetup).autorange() ... print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)") ... lambda: 218.715 microseconds (1000 repetitions) attrgetter: 169.064 microseconds (2000 repetitions)
"Making changes to existing code that works" is how programs evolve;-). Write a good battery of tests that give known results with the existing code, save those results (that's normally known as "golden files" in a testing context); then make the changes, rerun the tests, and verify (ideally in an automated way) that the only changes to the tests' results are those that are specifically intended to be there -- no undesired or unexpected side effects. One can use more sophisticated quality assurance strategies, of course, but this is the gist of many "integration testing" approaches.
As for the two ways to write simple key=
function, the design intent was to make operator.attrgetter
faster by being more specialized, but at least in current versions of Python there's no measurable difference in speed. That being the case, for this special situation I would recommend the lambda
, simply because it's more concise and general (and I'm not usually a lambda-lover, mind you!-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With