Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get the index of the median in python in one command?

Tags:

python

math

numpy

Is there something like numpy.argmin(x), but for median?

like image 894
Itay Lieder Avatar asked Oct 03 '15 14:10

Itay Lieder


People also ask

How do you find the median in Python?

median() method calculates the median (middle value) of the given data set. This method also sorts the data in ascending order before calculating the median. Tip: The mathematical formula for Median is: Median = {(n + 1) / 2}th value, where n is the number of values in a set of data.

What does NP median do in Python?

median() in Python. numpy. median(arr, axis = None) : Compute the median of the given data (array elements) along the specified axis.


3 Answers

a quick approximation:

numpy.argsort(data)[len(data)//2]
like image 155
o17t H1H' S'k Avatar answered Oct 05 '22 02:10

o17t H1H' S'k


It seems old question, but i found a nice way to make it so:

import random
import numpy as np
#some random list with 20 elements
a = [random.random() for i in range(20)]
#find the median index of a
medIdx = a.index(np.percentile(a,50,interpolation='nearest'))

The neat trick here is the percentile builtin option for nearest interpolation, which return a "real" median value from the list, so it is safe to search for it afterwards.

like image 24
Hagay Avatar answered Oct 05 '22 01:10

Hagay


In general, this is an ill-posed question because an array does not necessarily contain its own median for numpy's definition of the median. For example:

>>> np.median([1, 2])
1.5

But when the length of the array is odd, the median will generally be in the array, so asking for its index does make sense:

>>> np.median([1, 2, 3])
2

For odd-length arrays, an efficient way to determine the index of the median value is by using the np.argpartition function. For example:

import numpy as np

def argmedian(x):
  return np.argpartition(x, len(x) // 2)[len(x) // 2]

# Works for odd-length arrays, where the median is in the array:
x = np.random.rand(101)

print("median in array:", np.median(x) in x)
# median in array: True

print(x[argmedian(x)], np.median(x))
# 0.5819150016674371 0.5819150016674371

# Doesn't work for even-length arrays, where the median is not in the array:
x = np.random.rand(100)

print("median in array:", np.median(x) in x)
# median in array: False

print(x[argmedian(x)], np.median(x))
# 0.6116799104572843 0.6047559243909065

This is quite a bit faster than the accepted sort-based solution as the size of the array grows:

x = np.random.rand(1000)
%timeit np.argsort(x)[len(x)//2]
# 10000 loops, best of 3: 25.4 µs per loop
%timeit np.argpartition(x, len(x) // 2)[len(x) // 2]
# 100000 loops, best of 3: 6.03 µs per loop
like image 31
jakevdp Avatar answered Oct 05 '22 03:10

jakevdp