Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a list of tuples with multiple conditions

I am currently trying to sort the following list:

list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

These are the steps I want to take in order to sort it:

  1. Sort the list by the value of the first element of the tuples
  2. Next, sort the list by the length of the second element of the tuples (not the value, the length!) AFTER step 1 finishes.
  3. Next, sort the list by the value of the second element of the tuples AFTER step 1 and step 2 finishes.

My attempt:

sorted_by_length = sorted(list_, key=len x:x[1])

However, I received a syntax error concerning the x after key= len. What is the right variable I should be using in this case?

The correct, sorted list should be:

sorted_by_length = [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]

Thank you for help.

like image 261
Marius Küpper Avatar asked Oct 28 '13 19:10

Marius Küpper


3 Answers

The key function can return a tuple.

sorted_by_length = sorted(list_,
                         key=lambda x: (x[0], len(x[1]), float(x[1])))

This works because tuples are sorted lexicographically: (the first element of the tuple is used for sorting first, then the second element is used for breaking ties, and then the third element is used for breaking any remaining ties.)

See the excellent HOWTO Sort for an explanation of this and other issues related to sorting.


In [1]: list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [2]: sorted_by_length = sorted(list_,
                         key=lambda x: (x[0], len(x[1]), float(x[1])))
   ...: 
In [3]: sorted_by_length
Out[3]: [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]

If the second element of each tuple is the string representation of an int in binary, then use int(x, 2) instead of float(x) in the sort key. If they are intended to be the decimal representation of an integer, then use int(x).

like image 159
unutbu Avatar answered Oct 07 '22 18:10

unutbu


You can sort using for key function that return collection as result

list_.sort(key=lambda x: [x[0], len(x[1]), x[1]])

key parameter to specify a function to be called on each list element prior to making comparisons.

If You use collection as key result then It will be sorted using first comparing first elements if they are equal then seconds ones will be compared and so on...

P.S. As I understand It is not necessary to cast third item to numeric type because if the equal is the same then for binary values lexicographical and numeric ordering will give same result

like image 39
oleg Avatar answered Oct 07 '22 16:10

oleg


The right solution is to use a key function that returns a tuple, as shown in unutbu's answer. However there is an other way of doing it. Python's sort is guaranteed to be stable, so you can do multiple sorts by different keys and achieve the output you want. In particular:

list_.sort(key=lambda x: float(x[1]))
list_.sort(key=lambda x: len(x[1]))
list_.sort(key=lambda x: x[0])

Demo with IPython:

In [1]: list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [2]: list_.sort(key=lambda x: float(x[1]))
   ...: list_.sort(key=lambda x: len(x[1]))
   ...: list_.sort(key=lambda x: x[0])
   ...: 

In [3]: list_
Out[3]: [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]

Note: this solution resembles the three steps you described in your question but the steps are reversed! Sort by the primary key last to get the correct output.

Also keep in mind that the algorithm used for sorting is adaptive. this means that when a sequence is already partially sorted it can use the partial order to sort more efficiently(often in linear time instead of nlog(n)). When you sort by multiple keys you often achieve this partial order, hence the multiple calls to sort() do not cost much. However it highly depends on the keys and the data. Sometimes it is more efficient than using tuples as keys, sometimes it's quite slower.


An example of timing. Note that the two solutions take mostly the same time.

In [9]: list_
Out[9]: [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [10]: list_ *= 1000   # better to avoid too small benchmarks.

In [11]: %%timeit
    ...: a = sorted(list_, key=lambda x: (x[0], len(x[1]), float(x[1])))
    ...: 
100 loops, best of 3: 6.04 ms per loop

In [12]: %%timeit
    ...: a = sorted(list_, key=lambda x: float(x[1]))
    ...: a.sort(key=lambda x: len(x[1]))
    ...: a.sort(key=lambda x: x[0])
    ...: 
100 loops, best of 3: 5.72 ms per loop
In [13]: import random
    ...: data = [(random.randint(1, 1000), bin(random.randint(1, 100))[2:]) for _ in range(10000)]
    ...: 

In [14]: %%timeit
    ...: a = sorted(data, key=lambda x: (x[0], len(x[1]), float(x[1])))
    ...: 
100 loops, best of 3: 15.2 ms per loop

In [15]: %%timeit
    ...: a = sorted(data, key=lambda x: float(x[1]))
    ...: a.sort(key=lambda x: len(x[1]))
    ...: a.sort(key=lambda x: x[0])
    ...: 
100 loops, best of 3: 15.1 ms per loop
like image 22
Bakuriu Avatar answered Oct 07 '22 18:10

Bakuriu