I'm wondering what the most efficient way to do an argsort of an array given a condition, while preserving the original index
import numpy as np
x = np.array([0.63, 0.5, 0.7, 0.65])
np.argsort(x)
#Corrected argsort(x) solution
Out[99]: array([1, 0, 3, 2])
I want to argsort this array with condition that x>0.6. Since 0.5 < 0.6, index 1 should not be included.
x = np.array([0.63, 0.5, 0.7, 0.65])
index = x.argsort()
list(filter(lambda i: x[i] > 0.6, index))
[0,3,2]
This is inefficient since its not vectorized.
EDIT: The filter will eliminate most of elements. So ideally, it filter first, then sort, while preserving original index.
Too late to the party too and if my solution is a repeat of an already posted solution - ping me and I will delete it.
def meth_agn_v1(x, thresh):
idx = np.arange(x.size)[x > thresh]
return idx[np.argsort(x[idx])]
Then,
In [143]: meth_agn_v1(x, 0.5)
Out[143]: array([0, 3, 2])
This uses the same idea expressed in the last section of my answer (comparison to Tai's method) that integer indexing is faster than boolean indexing (for small number of expected elements to be selected) and avoids creating an initial index at all.
def meth_agn_v2(x, thresh):
idx, = np.where(x > thresh)
return idx[np.argsort(x[idx])]
In [144]: x = np.random.rand(100000)
In [145]: timeit meth_jp(x, 0.99)
100 loops, best of 3: 7.43 ms per loop
In [146]: timeit meth_alex(x, 0.99)
1000 loops, best of 3: 498 µs per loop
In [147]: timeit meth_tai(x, 0.99)
1000 loops, best of 3: 298 µs per loop
In [148]: timeit meth_agn_v1(x, 0.99)
1000 loops, best of 3: 232 µs per loop
In [161]: timeit meth_agn_v2(x, 0.99)
10000 loops, best of 3: 95 µs per loop
My first version of the answer is very similar to Tai's answer but not identical.
Tai's method as published originally:
def meth_tai(x, thresh):
y = np.arange(x.shape[0])
y = y [x > thresh]
x = x [x > thresh] # x = x[y] is used in my method
y[np.argsort(x)]
So, my method is different in using integer array indexing instead of the boolean indexing used by Tai. For a small number of selected elements integer indexing is faster than boolean indexing making this method more efficient than Tai's method even after Tai optimized his code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With