Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to sort multiple lists - Python

I have two lists, x and y, and I want to sort x and permute y by the permutation of x-sorting. For example, given

x = [4, 2, 1, 3]
y = [40, 200, 1, 30]

I want to get

x_sorted = [1,2,3,4]
y_sorted = [1, 200, 30, 40]

As discussed in past questions, a simple way to solve this is

x_sorted, y_sorted = zip(*sorted(zip(x,y)))

Here is my question: What is the FASTEST way to do this?


I have three methods to do the task.

import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)

method 1:

x_sorted, y_sorted = zip(*sorted(zip(x,y))) #1.08 ms 

method 2:

foo = zip(x,y)
foo.sort()
zip(*foo)       #1.05 ms

method 3;

ind = range(1000)
ind.sort(key=lambda i:x[i])
x_sorted = [x[i] for i in ind]
y_sorted = [y[i] for i in ind]  #934us

Is there a better method, which executes faster than above three methods?


Additional questions.

  1. Why method 2 is not faster than method 1 though it uses sort method?
  2. If I execute method 2 separately, it is faster. In IPython terminal,

I have

%timeit foo = zip(x,y)   #1000 loops, best of 3: 220 us per loop
%timeit foo.sort()       #10000 loops, best of 3: 78.9 us per loop
%timeit zip(*foo)        #10000 loops, best of 3: 73.8 us per loop
like image 967
ywat Avatar asked Aug 21 '13 04:08

ywat


People also ask

What is the fastest way to sort a list in Python?

A best sorting algorithm in python Quicksort is also considered as the ” fastest” sorting algorithm because it has the best performance in the average case for most inputs.

Is sort () or sorted () faster?

Sort vs Sorted Performance Here, sort() method is executing faster than sorted() function. Here, sorted() function method is executing faster than sort() function. There is also a difference in execution time in both cases.

Is Max or sort faster?

sort is faster than each element fetch that max does.


3 Answers

Using numpy.argsort:

>>> import numpy as np
>>> x = np.array([4,2,1,3])
>>> y = np.array([40,200,1,30])
>>> order = np.argsort(x)
>>> x_sorted = x[order]
>>> y_sorted = y[order]
>>> x_sorted
array([1, 2, 3, 4])
>>> y_sorted
array([  1, 200,  30,  40])

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.030632019043

NOTE

This makes sense if input data are already numpy arrays.

like image 71
falsetru Avatar answered Oct 12 '22 23:10

falsetru


You're not timing this properly

%timeit foo.sort()

After the 1st loop, it's already sorted for the remainder. Timsort is very efficient for presorted lists.

I was a little surprised that @Roman's use of a key function was so much faster. You can improve on this further by using itemgetter

from operator import itemgetter
ig0 = itemgetter(0)
zip(*sorted(zip(x, y), key=ig0))

This is about 9% faster than using a lambda function for lists of 1000 elements

like image 29
John La Rooy Avatar answered Oct 12 '22 22:10

John La Rooy


>>> x = [4, 2, 1, 3]
>>> y = [40, 200, 1, 30]    
>>> x_sorted, y_sorted = zip(*sorted(zip(x, y), key=lambda a:a[0]))
>>> x_sorted
(1, 2, 3, 4)
>>> y_sorted
(1, 200, 30, 40)

Performance:

>>> timeit('foo = zip(x,y); foo.sort(); zip(*foo)', 'from __main__ import x, y', number=1000)
1.0197240443760691
>>> timeit('zip(*sorted(zip(x,y)))', 'from __main__ import x, y', number=1000)
1.0106219310922597
>>> timeit('ind = range(1000); ind.sort(key=lambda i:x[i]); x_sorted = [x[i] for i in ind]; y_sorteds = [y[i] for i in ind]', 'from __main__ import x, y', number=1000)
0.9043525504607857
>>> timeit('zip(*sorted(zip(x, y), key=lambda a:a[0]))', 'from __main__ import x, y', number=1000)
0.8288150863453723

To see the full picture:

>>> timeit('sorted(x)', 'from __main__ import x, y', number=1000)
0.40415491505723367            # just getting sorted list from x
>>> timeit('x.sort()', 'from __main__ import x, y', number=1000)
0.008009909448446706           # sort x inplace

@falsetru method - fastest for np.arrays

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.05441799872323827

As @AshwiniChaudhary suggested in comments, for lists there's a way to speed it up by using itertools.izip instead of zip:

>>> timeit('zip(*sorted(izip(x, y), key=itemgetter(0)))', 'from __main__ import x, y;from operator import itemgetter;from itertools import izip', number=1000)
0.4265049757161705
like image 33
Roman Pekar Avatar answered Oct 12 '22 22:10

Roman Pekar