Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a Python custom comparator work?

I have the following Python dict:

[(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]

Now I want to sort them on the basis of sum of values of the values of dictionary, so for the first key the sum of values is 3+4+5=12.

I have written the following code that does the job:

def myComparator(a,b):
    print "Values(a,b): ",(a,b)
    sum_a=sum(a[1])
    sum_b=sum(b[1])
    print sum_a,sum_b
    print "Comparision Returns:",cmp(sum_a,sum_b)
    return cmp(sum_a,sum_b)

items.sort(myComparator)
print items

This is what the output that I get after running above:

Values(a,b):  ((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
2 12
Comparision Returns: -1
Values(a,b):  ((4, [-1]), (3, [1, 0, 0, 0, 1]))
-1 2
Comparision Returns: -1
Values(a,b):  ((10, [1, 2, 3]), (4, [-1]))
6 -1
Comparision Returns: 1
Values(a,b):  ((10, [1, 2, 3]), (3, [1, 0, 0, 0, 1]))
6 2
Comparision Returns: 1
Values(a,b):  ((10, [1, 2, 3]), (2, [3, 4, 5]))
6 12
Comparision Returns: -1
[(4, [-1]), (3, [1, 0, 0, 0, 1]), (10, [1, 2, 3]), (2, [3, 4, 5])]

Now I am unable to understand as to how the comparator is working, which two values are being passed and how many such comparisons would happen? Is it creating a sorted list of keys internally where it keeps track of each comparison made? Also the behavior seems to be very random. I am confused, any help would be appreciated.

like image 432
avinash shah Avatar asked Jun 04 '26 12:06

avinash shah


2 Answers

The number and which comparisons are done is not documented and in fact, it can freely change from different implementations. The only guarantee is that if the comparison function makes sense the method will sort the list.

CPython uses the Timsort algorithm to sort lists, so what you see is the order in which that algorithm is performing the comparisons (if I'm not mistaken for very short lists Timsort just uses insertion sort)

Python is not keeping track of "keys". It just calls your comparison function every time a comparison is made. So your function can be called many more than len(items) times.

If you want to use keys you should use the key argument. In fact you could do:

items.sort(key=lambda x: sum(x[1]))

This will create the keys and then sort using the usual comparison operator on the keys. This is guaranteed to call the function passed by key only len(items) times.


Given that your list is:

[a,b,c,d]

The sequence of comparisons you are seeing is:

b < a   # -1  true   --> [b, a, c, d]
c < b   # -1  true   --> [c, b, a, d]
d < c   # 1   false
d < b   # 1   false
d < a   # -1  true   --> [c, b, d, a]
like image 73
Bakuriu Avatar answered Jun 06 '26 03:06

Bakuriu


how the comparator is working

This is well documented:

Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y.

Instead of calling the cmp function you could have written:

sum_a=sum(a[1])
sum_b=sum(b[1])
if sum_a < sum_b: 
   return -1
elif sum_a == sum_b:
   return 0
else:
   return 1

which two values are being passed

From your print statements you can see the two values that are passed. Let's look at the first iteration:

((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))

What you are printing here is a tuple (a, b), so the actual values passed into your comparison functions are

a = (3, [1, 0, 0, 0, 1])
b = (2, [3, 4, 5]))

By means of your function, you then compare the sum of the two lists in each tuple, which you denote sum_a and sum_b in your code.

and how many such comparisons would happen?

I guess what you are really asking: How does the sort work, by just calling a single function?

The short answer is: it uses the Timsort algorithm, and it calls the comparison function O(n * log n) times (note that the actual number of calls is c * n * log n, where c > 0).

To understand what is happening, picture yourself sorting a list of values, say v = [4,2,6,3]. If you go about this systematically, you might do this:

  1. start at the first value, at index i = 0
  2. compare v[i] with v[i+1]
  3. If v[i+1] < v[i], swap them
  4. increase i, repeat from 2 until i == len(v) - 2
  5. start at 1 until no further swaps occurred

So you get, i =

0: 2 < 4 => [2, 4, 6, 3] (swap)
1: 6 < 4 => [2, 4, 6, 3] (no swap)
2: 3 < 6 => [2, 4, 3, 6] (swap)

Start again:

0: 4 < 2 => [2, 4, 3, 6] (no swap)
1: 3 < 4 => [2, 3, 4, 6] (swap)
2: 6 < 4 => [2, 3, 4, 6] (no swap)

Start again - there will be no further swaps, so stop. Your list is sorted. In this example we have run through the list 3 times, and there were 3 * 3 = 9 comparisons.

Obviously this is not very efficient -- the sort() method only calls your comparator function 5 times. The reason is that it employs a more efficient sort algorithm than the simple one explained above.

Also the behavior seems to be very random.

Note that the sequence of values passed to your comparator function is not, in general, defined. However, the sort function does all the necessary comparisons between any two values of the iterable it receives.

Is it creating a sorted list of keys internally where it keeps track of each comparison made?

No, it is not keeping a list of keys internally. Rather the sorting algorithm essentially iterates over the list you give it. In fact it builds subsets of lists to avoid doing too many comparisons - there is a nice visualization of how the sorting algorithm works at Visualising Sorting Algorithms: Python's timsort by Aldo Cortesi

like image 45
miraculixx Avatar answered Jun 06 '26 03:06

miraculixx



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!