I have two lists, x and y, and I want to sort x and permute y by the permutation of x-sorting. For example, given
x = [4, 2, 1, 3]
y = [40, 200, 1, 30]
I want to get
x_sorted = [1,2,3,4]
y_sorted = [1, 200, 30, 40]
As discussed in past questions, a simple way to solve this is
x_sorted, y_sorted = zip(*sorted(zip(x,y)))
Here is my question: What is the FASTEST way to do this?
I have three methods to do the task.
import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)
method 1:
x_sorted, y_sorted = zip(*sorted(zip(x,y))) #1.08 ms
method 2:
foo = zip(x,y)
foo.sort()
zip(*foo) #1.05 ms
method 3;
ind = range(1000)
ind.sort(key=lambda i:x[i])
x_sorted = [x[i] for i in ind]
y_sorted = [y[i] for i in ind] #934us
Is there a better method, which executes faster than above three methods?
Additional questions.
I have
%timeit foo = zip(x,y) #1000 loops, best of 3: 220 us per loop
%timeit foo.sort() #10000 loops, best of 3: 78.9 us per loop
%timeit zip(*foo) #10000 loops, best of 3: 73.8 us per loop
A best sorting algorithm in python Quicksort is also considered as the ” fastest” sorting algorithm because it has the best performance in the average case for most inputs.
Sort vs Sorted Performance Here, sort() method is executing faster than sorted() function. Here, sorted() function method is executing faster than sort() function. There is also a difference in execution time in both cases.
sort is faster than each element fetch that max does.
Using numpy.argsort:
>>> import numpy as np
>>> x = np.array([4,2,1,3])
>>> y = np.array([40,200,1,30])
>>> order = np.argsort(x)
>>> x_sorted = x[order]
>>> y_sorted = y[order]
>>> x_sorted
array([1, 2, 3, 4])
>>> y_sorted
array([ 1, 200, 30, 40])
>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.030632019043
NOTE
This makes sense if input data are already numpy arrays.
You're not timing this properly
%timeit foo.sort()
After the 1st loop, it's already sorted for the remainder. Timsort is very efficient for presorted lists.
I was a little surprised that @Roman's use of a key function was so much faster. You can improve on this further by using itemgetter
from operator import itemgetter
ig0 = itemgetter(0)
zip(*sorted(zip(x, y), key=ig0))
This is about 9% faster than using a lambda function for lists of 1000 elements
>>> x = [4, 2, 1, 3]
>>> y = [40, 200, 1, 30]
>>> x_sorted, y_sorted = zip(*sorted(zip(x, y), key=lambda a:a[0]))
>>> x_sorted
(1, 2, 3, 4)
>>> y_sorted
(1, 200, 30, 40)
Performance:
>>> timeit('foo = zip(x,y); foo.sort(); zip(*foo)', 'from __main__ import x, y', number=1000)
1.0197240443760691
>>> timeit('zip(*sorted(zip(x,y)))', 'from __main__ import x, y', number=1000)
1.0106219310922597
>>> timeit('ind = range(1000); ind.sort(key=lambda i:x[i]); x_sorted = [x[i] for i in ind]; y_sorteds = [y[i] for i in ind]', 'from __main__ import x, y', number=1000)
0.9043525504607857
>>> timeit('zip(*sorted(zip(x, y), key=lambda a:a[0]))', 'from __main__ import x, y', number=1000)
0.8288150863453723
To see the full picture:
>>> timeit('sorted(x)', 'from __main__ import x, y', number=1000)
0.40415491505723367 # just getting sorted list from x
>>> timeit('x.sort()', 'from __main__ import x, y', number=1000)
0.008009909448446706 # sort x inplace
@falsetru method - fastest for np.arrays
>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.05441799872323827
As @AshwiniChaudhary suggested in comments, for lists there's a way to speed it up by using itertools.izip
instead of zip
:
>>> timeit('zip(*sorted(izip(x, y), key=itemgetter(0)))', 'from __main__ import x, y;from operator import itemgetter;from itertools import izip', number=1000)
0.4265049757161705
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With