In order to find the index of the smallest value, I can use <code>argmin</code>: <pre class="prettyprint"><code>import numpy as np A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) print A.argmin() # 4 because A[4] = 0.1 </code></pre> But how can I find the indices of the k-smallest values? I'm looking for something like: <pre class="prettyprint"><code>print A.argmin(numberofvalues=3) # [4, 0, 7] because A[4] <= A[0] <= A[7] <= all other A[i] </code></pre> Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

Use <code>np.argpartition</code>. It does not sort the entire array. It only guarantees that the <code>kth</code> element is in sorted position and all smaller elements will be moved before it. Thus the first <code>k</code> elements will be the k-smallest elements. <pre class="prettyprint"><code>import numpy as np A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) k = 3 idx = np.argpartition(A, k) print(idx) # [4 0 7 3 1 2 6 5] </code></pre> This returns the k-smallest values. Note that these may not be in sorted order. <pre class="prettyprint"><code>print(A[idx[:k]]) # [ 0.1 1. 1.5] </code></pre> <hr> To obtain the k-largest values use <pre class="prettyprint"><code>idx = np.argpartition(A, -k) # [4 0 7 3 1 2 6 5] A[idx[-k:]] # [ 9. 17. 17.] </code></pre> WARNING: Do not (re)use <code>idx = np.argpartition(A, k); A[idx[-k:]]</code> to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in <code>x</code>: <pre class="prettyprint"><code>x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0]) idx = np.argpartition(x, 3) x[idx[-3:]] array([ 70, 80, 100]) </code></pre> <hr> Here is a comparison against <code>np.argsort</code>, which also works but just sorts the entire array to get the result. <pre class="prettyprint"><code>In [2]: x = np.random.randn(100000) In [3]: %timeit idx0 = np.argsort(x)[:100] 100 loops, best of 3: 8.26 ms per loop In [4]: %timeit idx1 = np.argpartition(x, 100)[:100] 1000 loops, best of 3: 721 µs per loop In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100])) Out[5]: True </code></pre>

You can use <code>numpy.argsort</code> with slicing <pre class="prettyprint"><code>>>> import numpy as np >>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) >>> np.argsort(A)[:3] array([4, 0, 7], dtype=int32) </code></pre>

Find the index of the k smallest values of a numpy array

Tags:

python

numpy

In order to find the index of the smallest value, I can use argmin:

Click to copy

import numpy as np A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) print A.argmin()     # 4 because A[4] = 0.1

But how can I find the indices of the k-smallest values?

I'm looking for something like:

Click to copy

print A.argmin(numberofvalues=3)    # [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

527

asked Dec 11 '15 14:12

Basj

2 Answers

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

Click to copy

import numpy as np  A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) k = 3  idx = np.argpartition(A, k) print(idx) # [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

Click to copy

print(A[idx[:k]]) # [ 0.1  1.   1.5]

To obtain the k-largest values use

Click to copy

idx = np.argpartition(A, -k) # [4 0 7 3 1 2 6 5]  A[idx[-k:]] # [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

Click to copy

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0]) idx = np.argpartition(x, 3) x[idx[-3:]] array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

Click to copy

In [2]: x = np.random.randn(100000)  In [3]: %timeit idx0 = np.argsort(x)[:100] 100 loops, best of 3: 8.26 ms per loop  In [4]: %timeit idx1 = np.argpartition(x, 100)[:100] 1000 loops, best of 3: 721 µs per loop  In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100])) Out[5]: True

161

answered Sep 19 '22 22:09

unutbu

You can use numpy.argsort with slicing

Click to copy

>>> import numpy as np >>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5]) >>> np.argsort(A)[:3] array([4, 0, 7], dtype=int32)

answered Sep 20 '22 22:09

Cory Kramer

Related questions
                            
                                Access self from decorator
                            
                                Logging variable data with new format string
                            
                                How do threads work in Python, and what are common Python-threading specific pitfalls?
                            
                                Catch Ctrl+C / SIGINT and exit multiprocesses gracefully in python [duplicate]
                            
                                get dataframe row count based on conditions
                            
                                Accuracy Score ValueError: Can't Handle mix of binary and continuous target
                            
                                How slow is Python's string concatenation vs. str.join?
                            
                                Cannot pass an argument to python with "#!/usr/bin/env python"
                            
                                Pandas dataframe read_csv on bad data
                            
                                Convert JSON array to Python list
                            
                                Python get proper line ending
                            
                                How to build URLs in Python [closed]
                            
                                Monkey patching a class in another module in Python
                            
                                How to pass a variable to magic ´run´ function in IPython
                            
                                Pandas: Drop consecutive duplicates
                            
                                Updating a dataframe column in spark
                            
                                How to create a new instance from a class object in Python
                            
                                How do I tell a Python script to use a particular version
                            
                                Python/BeautifulSoup - how to remove all tags from an element?
                            
                                What is the core difference between asyncio and trio?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find the index of the k smallest values of a numpy array

Tags:

python

numpy

Basj

People also ask

2 Answers

unutbu

Cory Kramer

Recent Activity

Donate For Us