I have looked this answer which explains how to compute the value of a specific percentile, and this answer which explains how to compute the percentiles that correspond to each element.
Using the first solution, I can compute the value and scan the original array to find the index.
Using the second solution, I can scan the entire output array for the percentile I'm looking for.
However, both require an additional scan if I want to know the index (in the original array) that corresponds to a particular percentile (or the index containing the element closest to that index).
Is there is more direct or built-in way to get the index which corresponds to a percentile?
Note: My array is not sorted and I want the index in the original, unsorted array.
It is a little convoluted, but you can get what you are after with np.argpartition
. Lets take an easy array and shuffle it:
>>> a = np.arange(10)
>>> np.random.shuffle(a)
>>> a
array([5, 6, 4, 9, 2, 1, 3, 0, 7, 8])
If you want to find e.g. the index of quantile 0.25, this would correspond to the item in position idx
of the sorted array:
>>> idx = 0.25 * (len(a) - 1)
>>> idx
2.25
You need to figure out how to round that to an int, say you go with nearest integer:
>>> idx = int(idx + 0.5)
>>> idx
2
If you now call np.argpartition
, this is what you get:
>>> np.argpartition(a, idx)
array([7, 5, 4, 3, 2, 1, 6, 0, 8, 9], dtype=int64)
>>> np.argpartition(a, idx)[idx]
4
>>> a[np.argpartition(a, idx)[idx]]
2
It is easy to check that these last two expressions are, respectively, the index and the value of the .25 quantile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With