How can I find the index of the 3 smallest and 3 largest values in a column in my pandas dataframe? I saw ways to find max and min, but none to get the 3.
What have you tried? You could sort with s.sort()
and then call s.head(3).index
and s.tail(3).index
.
With smaller Series, you're better off just sorting then taking head/tail!
This is a pandas feature request, should see in 0.14 (need to overcome some fiddly bits with different dtypes), an efficient solution for larger Series (> 1000 elements) is using kth_smallest
from pandas algos (warning this function mutates the array it's applied to so use a copy!):
In [11]: s = pd.Series(np.random.randn(10))
In [12]: s
Out[12]:
0 0.785650
1 0.969103
2 -0.618300
3 -0.770337
4 1.532137
5 1.367863
6 -0.852839
7 0.967317
8 -0.603416
9 -0.889278
dtype: float64
In [13]: n = 3
In [14]: pd.algos.kth_smallest(s.values.astype(float), n - 1)
Out[14]: -0.7703374582084163
In [15]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)]
Out[15]:
3 -0.770337
6 -0.852839
9 -0.889278
dtype: float64
If you want this in order:
In [16]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order()
Out[16]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
If you're worried about duplicates (join nth place) you can take the head:
In [17]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order().head(n)
Out[17]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With