I'm measuring the median and percentiles of a sample of data using Python.
import numpy as np
xmedian=np.median(data)
x25=np.percentile(data, 25)
x75=np.percentile(data, 75)
Do I have to use the np.sort() function on my data before measuring the median?
According to the documentation of numpy. median , you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.
Python lists are better optimized for "plain Python" code: reading or writing to a list element is faster than it is for a NumPy array. The benefit of NumPy array comes from "whole array operations" (so called array operations) and from compiled extensions.
According to the documentation of numpy.median, you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.
Example, showing that sorting beforehand is unnecessary:
In [1]: import numpy as np
In [2]: data = np.array([[ 10, 23, 1, 4, 5],
...: [ 2, 12, 5, 22, 14]])
In [3]: median = np.median(data) # Median of unsorted data
In [4]: median
Out[4]: 7.5
In [5]: data.sort() # Sorting data
In [6]: median_sorted = np.median(data.ravel()) # Median of the flattened array
In [7]: median_sorted
Out[7]: 7.5
In [8]: median == median_sorted # Check that they are equal
Out[8]: True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With