Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort data before using numpy.median

I'm measuring the median and percentiles of a sample of data using Python.

import numpy as np
xmedian=np.median(data)
x25=np.percentile(data, 25)
x75=np.percentile(data, 75)

Do I have to use the np.sort() function on my data before measuring the median?

like image 440
Marika Blum Avatar asked May 23 '13 17:05

Marika Blum


People also ask

Does NumPy median sort the data?

According to the documentation of numpy. median , you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.

Is NumPy sort faster than Python sort?

Python lists are better optimized for "plain Python" code: reading or writing to a list element is faster than it is for a NumPy array. The benefit of NumPy array comes from "whole array operations" (so called array operations) and from compiled extensions.


1 Answers

According to the documentation of numpy.median, you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.

Example, showing that sorting beforehand is unnecessary:

In [1]: import numpy as np

In [2]: data = np.array([[ 10, 23,  1,  4,  5],
   ...:                  [  2, 12,  5, 22, 14]])

In [3]: median = np.median(data)  # Median of unsorted data

In [4]: median
Out[4]: 7.5

In [5]: data.sort()  # Sorting data

In [6]: median_sorted = np.median(data.ravel())  # Median of the flattened array

In [7]: median_sorted
Out[7]: 7.5

In [8]: median == median_sorted  # Check that they are equal
Out[8]: True
like image 68
sodd Avatar answered Sep 18 '22 23:09

sodd