Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: sort an array of dictionaries with custom comparator?

Tags:

python

I have the following Python array of dictionaries:

myarr = [ { 'name': 'Richard', 'rank': 1 },
{ 'name': 'Reuben', 'rank': 4 },
{ 'name': 'Reece', 'rank': 0 },
{ 'name': 'Rohan', 'rank': 3 },
{ 'name': 'Ralph', 'rank': 2 },
{ 'name': 'Raphael', 'rank': 0 },
{ 'name': 'Robin', 'rank': 0 } ]

I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0.

If I try:

sorted_master_list = sorted(myarr, key=itemgetter('rank'))

then the list is sorted in the order 0-0-0-1-2-3-4.

How can I define a custom comparator function to push zeroes to the bottom of the list? I'm wondering if I can use something like methodcaller.

like image 843
Richard Avatar asked Apr 12 '12 18:04

Richard


People also ask

How do I sort a list of dictionaries in Python?

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.

How do you sort a list with a custom function in Python?

In Python, we can write custom sort functions that work with sort() and sorted() . The value of the key parameter should be a function that takes a single argument and returns a key for sorting purposes.

How do I sort a custom comparator?

Using a comparator, we can sort the elements based on data members. For instance, it may be on roll no, name, age, or anything else. Method of Collections class for sorting List elements is used to sort the elements of List by the given comparator.


1 Answers

Option 1:

key=lambda d:(d['rank']==0, d['rank'])

Option 2:

key=lambda d:d['rank'] if d['rank']!=0 else float('inf')

Demo:

"I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0." --original poster

>>> sorted([0,0,0,1,2,3,4], key=lambda x:(x==0, x))
[1, 2, 3, 4, 0, 0]

>>> sorted([0,0,0,1,2,3,4], key=lambda x:x if x!=0 else float('inf'))
[1, 2, 3, 4, 0, 0]

 

Additional comments:

"Please could you explain to me (a Python novice) what it's doing? I can see that it's a lambda, which I know is an anonymous function: what's the bit in brackets?" – OP comment

Indexing/slice notation:

itemgetter('rank') is the same thing as lambda x: x['rank'] is the same thing as the function:

def getRank(myDict):
    return myDict['rank']

The [...] is called the indexing/slice notation, see Explain Python's slice notation - Also note that someArray[n] is common notation in many programming languages for indexing, but may not support slices of the form [start:end] or [start:end:step].

key= vs cmp= vs rich comparison:

As for what is going on, there are two common ways to specify how a sorting algorithm works: one is with a key function, and the other is with a cmp function (now deprecated in python, but a lot more versatile). While a cmp function allows you to arbitrarily specify how two elements should compare (input: a,b; output: a<b or a>b or a==b). Though legitimate, it gives us no major benefit (we'd have to duplicate code in an awkward manner), and a key function is more natural for your case. (See "object rich comparison" for how to implicitly define cmp= in an elegant but possibly-excessive way.)

Implementing your key function:

Unfortunately 0 is an element of the integers and thus has a natural ordering: 0 is normally < 1,2,3... Thus if we want to impose an extra rule, we need to sort the list at a "higher level". We do this by making the key a tuple: tuples are sorted first by their 1st element, then by their 2nd element. True will always be ordered after False, so all the Trues will be ordered after the Falses; they will then sort as normal: (True,1)<(True,2)<(True,3)<..., (False,1)<(False,2)<..., (False,*)<(True,*). The alternative (option 2), merely assigns rank-0 dictionaries a value of infinity, since that is guaranteed to be above any possible rank.

More general alternative - object rich comparison:

The even more general solution would be to create a class representing records, then implement __lt__, __gt__, __eq__, __ne__, __gt__, __ge__, and all the other rich comparison operators, or alternatively just implement one of those and __eq__ and use the @functools.total_ordering decorator. This will cause objects of that class to use the custom logic whenever you use comparison operators (e.g. x=Record(name='Joe', rank=12) y=Record(...) x<y); since the sorted(...) function uses < and other comparison operators by default in a comparison sort, this will make the behavior automatic when sorting, and in other instances where you use < and other comparison operators. This may or may not be excessive depending on your use case.

Cleaner alternative - don't overload 0 with semantics:

I should however point out that it's a bit artificial to put 0s behind 1,2,3,4,etc. Whether this is justified depends on whether rank=0 really means rank=0; if rank=0 are really "lower" than rank=1 (which in turn are really "lower" than rank=2...). If this is truly the case, then your method is perfectly fine. If this is not the case, then you might consider omitting the 'rank':... entry as opposed to setting 'rank':0. Then you could sort by Lev Levitsky's answer using 'rank' in d, or by:

Option 1 with different scheme:

key=lambda d: (not 'rank' in d, d['rank'])

Option 2 with different scheme:

key=lambda d: d.get('rank', float('inf'))

sidenote: Relying on the existence of infinity in python is almost borderline a hack, making any of the mentioned solutions (tuples, object comparison), Lev's filter-then-concatenate solution, and even maybe the slightly-more-complicated cmp solution (typed up by wilson), more generalizable to other languages.

like image 77
ninjagecko Avatar answered Nov 03 '22 01:11

ninjagecko