I'm trying to get the row of the median value for a column.
I'm using data.median() to get the median value for 'column'.
id 30444.5
someProperty 3.0
numberOfItems 0.0
column 70.0
And data.median()['column'] is subsequently:
data.median()['performance']
>>> 70.0
How can get the row or index of the median value? Is there anything similar to idxmax / idxmin?
I tried filtering but it's not reliable in cases multiple rows have the same value.
Thanks!
You can use rank
and idxmin
and apply it to each column:
import numpy as np
import pandas as pd
def get_median_index(d):
ranks = d.rank(pct=True)
close_to_median = abs(ranks - 0.5)
return close_to_median.idxmin()
df = pd.DataFrame(np.random.randn(13, 4))
df
0 1 2 3
0 0.919681 -0.934712 1.636177 -1.241359
1 -1.198866 1.168437 1.044017 -2.487849
2 1.159440 -1.764668 -0.470982 1.173863
3 -0.055529 0.406662 0.272882 -0.318382
4 -0.632588 0.451147 -0.181522 -0.145296
5 1.180336 -0.768991 0.708926 -1.023846
6 -0.059708 0.605231 1.102273 1.201167
7 0.017064 -0.091870 0.256800 -0.219130
8 -0.333725 -0.170327 -1.725664 -0.295963
9 0.802023 0.163209 1.853383 -0.122511
10 0.650980 -0.386218 -0.170424 1.569529
11 0.678288 -0.006816 0.388679 -0.117963
12 1.640222 1.608097 1.779814 1.028625
df.apply(get_median_index, 0)
0 7
1 7
2 3
3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With