I have an array that determines an ordering of elements:
order = [3, 1, 4, 2]
And then I want to sort another, larger array (containing only those elements):
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])
such that the element(s) that come first in order
come first in the results, etc.
In straight Python, I would do this with a key function:
sorted(a, key=order.index)
[3, 3, 1, 1, 1, 4, 4, 2]
How can I do this (efficiently) with numpy? Is there a similar notion of "key function" for numpy arrays?
To sort the value of one array according to another in Python use the np. argsort(~) method.
By Sorting a NumPy array in descending order sorts the elements from largest to smallest value. You can use the syntax array[::-1] to reverse the array. For example, sorting [5,8,6,12] in descending order results in [12 8 6 5].
The Python NumPy library is very general. It can use either row-major or column-major ordered arrays, but it defaults to row-major ordering.
Ints
For ints
, we could use bincount
-
np.repeat(order,np.bincount(a)[order])
Sample run -
In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]
In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])
Approach #1
Generalizing for all dtypes with bincount
-
# https://stackoverflow.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])
Approach #2-A
With np.unique
and searchsorted
for the case when all elements from order
are in a
-
unq, count = np.unique(a, return_counts=True)
out = np.repeat(order, count[np.searchsorted(unq, order)])
Approach #2-B
To cover for all cases, we need one extra step -
unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))
Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:
c = Counter(a)
np.repeat(order, [c[v] for v in order])
(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.
I guess a pure numpy equivalent would look like this:
count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])
But that's less direct, more code, and way too many sorts. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With