Is there something like numpy.argmin(x)
, but for median?
median() method calculates the median (middle value) of the given data set. This method also sorts the data in ascending order before calculating the median. Tip: The mathematical formula for Median is: Median = {(n + 1) / 2}th value, where n is the number of values in a set of data.
median() in Python. numpy. median(arr, axis = None) : Compute the median of the given data (array elements) along the specified axis.
a quick approximation:
numpy.argsort(data)[len(data)//2]
It seems old question, but i found a nice way to make it so:
import random
import numpy as np
#some random list with 20 elements
a = [random.random() for i in range(20)]
#find the median index of a
medIdx = a.index(np.percentile(a,50,interpolation='nearest'))
The neat trick here is the percentile builtin option for nearest interpolation, which return a "real" median value from the list, so it is safe to search for it afterwards.
In general, this is an ill-posed question because an array does not necessarily contain its own median for numpy's definition of the median. For example:
>>> np.median([1, 2])
1.5
But when the length of the array is odd, the median will generally be in the array, so asking for its index does make sense:
>>> np.median([1, 2, 3])
2
For odd-length arrays, an efficient way to determine the index of the median value is by using the np.argpartition
function. For example:
import numpy as np
def argmedian(x):
return np.argpartition(x, len(x) // 2)[len(x) // 2]
# Works for odd-length arrays, where the median is in the array:
x = np.random.rand(101)
print("median in array:", np.median(x) in x)
# median in array: True
print(x[argmedian(x)], np.median(x))
# 0.5819150016674371 0.5819150016674371
# Doesn't work for even-length arrays, where the median is not in the array:
x = np.random.rand(100)
print("median in array:", np.median(x) in x)
# median in array: False
print(x[argmedian(x)], np.median(x))
# 0.6116799104572843 0.6047559243909065
This is quite a bit faster than the accepted sort-based solution as the size of the array grows:
x = np.random.rand(1000)
%timeit np.argsort(x)[len(x)//2]
# 10000 loops, best of 3: 25.4 µs per loop
%timeit np.argpartition(x, len(x) // 2)[len(x) // 2]
# 100000 loops, best of 3: 6.03 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With