Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help sorting: first by this, and then by that

Tags:

python

sorting

I have a list of tuples I am trying to sort and could use some help.

The field I want to sort by in the tuples looks like "XXX_YYY". First, I want to group the XXX values in reverse order, and then, within those groups, I want to place the YYY values in normal sort order. (NOTE: I am just as happy, actually, sorting the second item in the tuple in this way, reverse order first word, normal order second.)

Here is an example of what I have and what I would like in the end ... not sure how to do it.

mylist = [
    (u'community_news', u'Community: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_video', u'Community: Video'), 
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_magazine', u'KF: Magazine')
]

I would like to perform some sort of sort() on this list that will change the output to:

sorted = [
    (u'kf_magazine', u'KF: Magazine'),
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_news', u'Community: News & Information'), 
    (u'community_video', u'Community: Video'), 
]

I suspect there may be a pythonic way to handle this but am not able to wrap my head around it.

like image 388
thornomad Avatar asked Nov 01 '09 14:11

thornomad


2 Answers

def my_cmp(x, y):
  x1, x2 = x[0].split('_')
  y1, y2 = y[0].split('_')
  return -cmp(x1, y1) or cmp(x2, y2)

my_list = [
    (u'community_news', u'Community: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_video', u'Community: Video'), 
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_magazine', u'KF: Magazine')
]

sorted_list = [
    (u'kf_magazine', u'KF: Magazine'),
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_news', u'Community: News & Information'), 
    (u'community_video', u'Community: Video'), 
]

my_list.sort(cmp=my_cmp)
assert my_list == sorted_list
like image 183
Ayman Hourieh Avatar answered Sep 29 '22 10:09

Ayman Hourieh


Custom comparison functions for sorting, as suggested in existing answers, do make it easy to sort in a mix of ascending and descending orders -- but they have serious performance issues and have been removed in Python 3, leaving only the preferred customization approach -- custom key-extraction functions... much speedier, though more delicate to use for the relatively rare use case of mixed ascending/descending sorts.

In Python 2.*, which supports either kind of customization (not both in the same call to sort or sorted:-), a custom comparison function can be passed as a cmp= named argument; or, a custom key-extraction function can be passed as a key= named argument. In Python 3.*, only the latter option is available.

It's definitely worth understanding the key-extraction approach, even if you think you've just solved your problem with a custom-comparison approach instead: not just for performance, but for future-proofness (Python 3) and for generality (the key= approach also applies to min, max, itertools.groupby... much more general than the cmp= approach!).

Key-extraction is very simple when all the key subfields are to be sorted the same way (all ascending, or all descending) -- you just extract them; it's still pretty easy if the subfields that go "the other way" are numbers (you just change their sign while extracting); the delicate case is exactly the one you have -- multiple string fields that must be compared in oppposite ways.

A reasonably simple approach to solving your problem is a tiny shim class:

class Reverser(object):
  def __init__(self, s): self.s = s
  def __lt__(self, other): return other.s < self.s
  def __eq__(self, other): return other.s == self.s

Note that you only have to supply __lt__ and __eq__ (the < and == operators) -- sort and friends synthesize all other comparisons, if needed, based on those two.

So, armed with this little auxiliary tool, we can proceed easily...:

def getkey(tup):
    a, b = tup[0].split('_')
    return Reverser(a), b

my_list.sort(key=getkey)

As you see, once you "get" the reverser and key extraction concepts, you pay essentially no price for using key extraction instead of custom comparison: the code I suggest is 4 statements for the reverser class (which you can write once and put into your "goodies bag" module somewhere), three for the key extraction function, and of course one for the sort or sorted call -- a total of eight vs the 4 + 1 == 5 of the custom comparison approach in the most compact form (i.e. the one using either cmp with a sign change, or cmp with swapped arguments). Three statements are not much of a price to pay for key-extraction's advantages!-)

Performance is clearly not a big issue with such a short list, but with an even modestly longer (10 times) one...:

# my_list as in the Q, my_cmp as per top A, getkey as here

def bycmp():
  return sorted(my_list*10, cmp=my_cmp)

def bykey():
  return sorted(my_list*10, key=getkey)

...

$ python -mtimeit -s'import so' 'so.bykey()'
1000 loops, best of 3: 548 usec per loop
$ python -mtimeit -s'import so' 'so.bycmp()'
1000 loops, best of 3: 995 usec per loop

I.e., the key= approach is already showing a performance gain of almost two times (sorting the list twice as fast) when working on a 50-items list -- well worth the modest price of "8 lines rather than 5", particularly with all the other advantages I already mentioned!

like image 23
Alex Martelli Avatar answered Sep 29 '22 09:09

Alex Martelli