Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to cast values when using operator.itemgetter() as sort key?

I've a list of lists containing string represented numbers:

nums = [['1','3'],['2','2'],['1','2'],['0','2'],['11','2']]

I need to sort these numerically ascending by the first then second entries without modify the string representation of the numbers in the original list. Also, want to avoid creating another 2nd copy of the list with everything mapped to integers explicitly--imagine this is a huge list.

Both sort() and sorted() works nicely with tuples and lists, so with a lambda key, I can do the following:

>>> sorted(nums, key=lambda n: (int(n[0]),int(n[1])) 
[['0', '2'], ['1', '2'], ['1', '3'], ['2', '2'], ['11', '2']]

Happy days...

However, I've seen a number of postings about sorting being faster using the operator.itemgetter() as key function over using a lambda. Without getting into a discussion of the validity of these claims, does anyone if it's possible apply a conversion from string to integer for comparision when using operator.itemgetter():

The following obviously fails, due to the strings being compared as strings, not numbers:

>>> sorted(nums, key=operator.itemgetter(0,1)) 
[['0', '2'], ['1', '2'], ['1', '3'], ['11', '2'], ['2', '2']]
like image 371
Ray Avatar asked Mar 09 '23 12:03

Ray


2 Answers

operator.itemgetter is fast not because it does something special in sort, but because it is entirely written in c, and does not involve calling a python function.

So what you're looking for is a C function that does what you want - itemgetter is a red herring.

In python 2, you can avoid calling pure-python functions with key=functools.partial(map, int). This won't work in python 3, because map no longer returns a list or tuple. This also might not be any faster than your solution.

like image 181
Eric Avatar answered Apr 05 '23 23:04

Eric


There are ways, for example using iteration_utilities.chained 1 and functools.partial:

>>> import operator import itemgetter
>>> from iteration_utilities import chained
>>> from functools import partial

>>> itemgetter_int = chained(operator.itemgetter(0, 1), partial(map, int), tuple)
>>> sorted(nums, key=itemgetter_int)
[['0', '2'], ['1', '2'], ['1', '3'], ['2', '2'], ['11', '2']]

It works but it's definetly slower than using a lambda or custom defined function.

If you really need speed you could cythonize the lambda-function (or write it in C by hand) but if you just need it in one place just use a throw-away lambda. Especially if it's in sorted because it has O(nlog(n)) comparisons so the O(n) function calls probably don't contribute much to the overall execution time.


1: This is a function in a 3rd party extension module that I authored. It needs to be seperatly installed, for example via conda or pip.

like image 26
MSeifert Avatar answered Apr 06 '23 00:04

MSeifert