How can I find the index of the 3 smallest and 3 largest values in a column in my pandas dataframe? I saw ways to find max and min, but none to get the 3.
What have you tried? You could sort with s.sort() and then call s.head(3).index and s.tail(3).index.
With smaller Series, you're better off just sorting then taking head/tail!
This is a pandas feature request, should see in 0.14 (need to overcome some fiddly bits with different dtypes), an efficient solution for larger Series (> 1000 elements) is using kth_smallest from pandas algos (warning this function mutates the array it's applied to so use a copy!):
In [11]: s = pd.Series(np.random.randn(10))
In [12]: s
Out[12]:
0 0.785650
1 0.969103
2 -0.618300
3 -0.770337
4 1.532137
5 1.367863
6 -0.852839
7 0.967317
8 -0.603416
9 -0.889278
dtype: float64
In [13]: n = 3
In [14]: pd.algos.kth_smallest(s.values.astype(float), n - 1)
Out[14]: -0.7703374582084163
In [15]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)]
Out[15]:
3 -0.770337
6 -0.852839
9 -0.889278
dtype: float64
If you want this in order:
In [16]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order()
Out[16]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
If you're worried about duplicates (join nth place) you can take the head:
In [17]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order().head(n)
Out[17]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With