Python: sort an array of dictionaries with custom comparator?

Tags:

python

I have the following Python array of dictionaries:

myarr = [ { 'name': 'Richard', 'rank': 1 },
{ 'name': 'Reuben', 'rank': 4 },
{ 'name': 'Reece', 'rank': 0 },
{ 'name': 'Rohan', 'rank': 3 },
{ 'name': 'Ralph', 'rank': 2 },
{ 'name': 'Raphael', 'rank': 0 },
{ 'name': 'Robin', 'rank': 0 } ]

I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0.

If I try:

sorted_master_list = sorted(myarr, key=itemgetter('rank'))

then the list is sorted in the order 0-0-0-1-2-3-4.

How can I define a custom comparator function to push zeroes to the bottom of the list? I'm wondering if I can use something like methodcaller.

843

asked Apr 12 '12 18:04

1 Answers

Option 1:

key=lambda d:(d['rank']==0, d['rank'])

Option 2:

key=lambda d:d['rank'] if d['rank']!=0 else float('inf')

Demo:

"I'd like to sort it by the rank values, ordering as follows: 1-2-3-4-0-0-0." --original poster

>>> sorted([0,0,0,1,2,3,4], key=lambda x:(x==0, x))
[1, 2, 3, 4, 0, 0]

>>> sorted([0,0,0,1,2,3,4], key=lambda x:x if x!=0 else float('inf'))
[1, 2, 3, 4, 0, 0]

Additional comments:

"Please could you explain to me (a Python novice) what it's doing? I can see that it's a lambda, which I know is an anonymous function: what's the bit in brackets?" – OP comment

Indexing/slice notation:

itemgetter('rank') is the same thing as lambda x: x['rank'] is the same thing as the function:

def getRank(myDict):
    return myDict['rank']

The [...] is called the indexing/slice notation, see Explain Python's slice notation - Also note that someArray[n] is common notation in many programming languages for indexing, but may not support slices of the form [start:end] or [start:end:step].

key= vs cmp= vs rich comparison:

As for what is going on, there are two common ways to specify how a sorting algorithm works: one is with a key function, and the other is with a cmp function (now deprecated in python, but a lot more versatile). While a cmp function allows you to arbitrarily specify how two elements should compare (input: a,b; output: a<b or a>b or a==b). Though legitimate, it gives us no major benefit (we'd have to duplicate code in an awkward manner), and a key function is more natural for your case. (See "object rich comparison" for how to implicitly define cmp= in an elegant but possibly-excessive way.)

Implementing your key function:

Unfortunately 0 is an element of the integers and thus has a natural ordering: 0 is normally < 1,2,3... Thus if we want to impose an extra rule, we need to sort the list at a "higher level". We do this by making the key a tuple: tuples are sorted first by their 1st element, then by their 2nd element. True will always be ordered after False, so all the Trues will be ordered after the Falses; they will then sort as normal: (True,1)<(True,2)<(True,3)<..., (False,1)<(False,2)<..., (False,*)<(True,*). The alternative (option 2), merely assigns rank-0 dictionaries a value of infinity, since that is guaranteed to be above any possible rank.

More general alternative - object rich comparison:

The even more general solution would be to create a class representing records, then implement __lt__, __gt__, __eq__, __ne__, __gt__, __ge__, and all the other rich comparison operators, or alternatively just implement one of those and __eq__ and use the @functools.total_ordering decorator. This will cause objects of that class to use the custom logic whenever you use comparison operators (e.g. x=Record(name='Joe', rank=12) y=Record(...) x<y); since the sorted(...) function uses < and other comparison operators by default in a comparison sort, this will make the behavior automatic when sorting, and in other instances where you use < and other comparison operators. This may or may not be excessive depending on your use case.

Cleaner alternative - don't overload 0 with semantics:

I should however point out that it's a bit artificial to put 0s behind 1,2,3,4,etc. Whether this is justified depends on whether rank=0 really means rank=0; if rank=0 are really "lower" than rank=1 (which in turn are really "lower" than rank=2...). If this is truly the case, then your method is perfectly fine. If this is not the case, then you might consider omitting the 'rank':... entry as opposed to setting 'rank':0. Then you could sort by Lev Levitsky's answer using 'rank' in d, or by:

Option 1 with different scheme:

key=lambda d: (not 'rank' in d, d['rank'])

Option 2 with different scheme:

key=lambda d: d.get('rank', float('inf'))

sidenote: Relying on the existence of infinity in python is almost borderline a hack, making any of the mentioned solutions (tuples, object comparison), Lev's filter-then-concatenate solution, and even maybe the slightly-more-complicated cmp solution (typed up by wilson), more generalizable to other languages.

answered Nov 03 '22 01:11

ninjagecko

Related questions
                            
                                Can EXE generated by cx_freeze be completely decompiled back to readable Python code?
                            
                                PyLab title/legend labels with multiple line of text
                            
                                CSRF token missing or incorrect even though I have {% csrf_token %}
                            
                                iterating through a list with an if statement
                            
                                How can I use both a key and an index for the same dictionary value?
                            
                                Accessing function arguments from decorator
                            
                                executing an R script from python
                            
                                Error installing PyQt
                            
                                Python: testing for None, testing for boolean value
                            
                                Calculating Time in Python (datetime.timedelta?)
                            
                                Python: missing class attribute __module__ when using type()?
                            
                                SWIG C++ to Python: Warning(362): operator= ignored
                            
                                How to return multiple values from *args?
                            
                                Similar .rdata functionality in Python?
                            
                                regex: string with optional parts
                            
                                How to install PIL in Ubuntu 11.04?
                            
                                Python bidirectional mapping
                            
                                Python: How to allow duplicates in a set?
                            
                                Automated way to switch from epydoc's docstring formatting to sphinx docstring formatting?
                            
                                Unable to install boto in python3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: sort an array of dictionaries with custom comparator?

Tags:

python

Richard

People also ask

1 Answers

ninjagecko

Recent Activity

Donate For Us