Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas 3 smallest & 3 largest values

How can I find the index of the 3 smallest and 3 largest values in a column in my pandas dataframe? I saw ways to find max and min, but none to get the 3.

like image 390
user1802143 Avatar asked Feb 15 '23 11:02

user1802143


2 Answers

What have you tried? You could sort with s.sort() and then call s.head(3).index and s.tail(3).index.

like image 192
TomAugspurger Avatar answered Feb 17 '23 00:02

TomAugspurger


With smaller Series, you're better off just sorting then taking head/tail!

This is a pandas feature request, should see in 0.14 (need to overcome some fiddly bits with different dtypes), an efficient solution for larger Series (> 1000 elements) is using kth_smallest from pandas algos (warning this function mutates the array it's applied to so use a copy!):

In [11]: s = pd.Series(np.random.randn(10))

In [12]: s
Out[12]: 
0    0.785650
1    0.969103
2   -0.618300
3   -0.770337
4    1.532137
5    1.367863
6   -0.852839
7    0.967317
8   -0.603416
9   -0.889278
dtype: float64

In [13]: n = 3

In [14]: pd.algos.kth_smallest(s.values.astype(float), n - 1)
Out[14]: -0.7703374582084163

In [15]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)]
Out[15]: 
3   -0.770337
6   -0.852839
9   -0.889278
dtype: float64

If you want this in order:

In [16]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order()
Out[16]: 
9   -0.889278
6   -0.852839
3   -0.770337
dtype: float64

If you're worried about duplicates (join nth place) you can take the head:

In [17]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order().head(n)
Out[17]: 
9   -0.889278
6   -0.852839
3   -0.770337
dtype: float64
like image 38
Andy Hayden Avatar answered Feb 16 '23 23:02

Andy Hayden