Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe select entire rows with highest values from a specified column

Tags:

python

pandas

I have a dataframe where I want to return the full row that contains the largest values out of a specified column. So let's say I create a dataframe like this:

df = pd.DataFrame(np.random.randint(0,100,size=(25, 4)), columns=list('ABCD'))

Then I'd have a table like this (sorry I can't get a proper table to form, so I just made a short one up):

A    B    C    D
14   67   35   22
75   21   34   64

And let's say it goes on for 25 rows like that. I want to take the top 5 largest values of column C and return those full rows.

If I do:

df['C'].nlargest()

it returns those 5 largest values, but I want it to return the full row.

I thought the below would work, but it gives me an error of "IndexError: indices are out-of-bounds":

df[df['C'].nlargest()]

I know this will be an easy solution for many people here, but it's stumped me. Thanks for your help.

like image 697
Emac Avatar asked Oct 28 '25 13:10

Emac


2 Answers

you want to use columns parameter:

In [53]: df.nlargest(5, columns=['C'])
Out[53]:
     A   B   C   D
17  43  91  95  32
18  13  36  81  56
7   61  90  76  85
16  68  21  73  68
14   3  64  71  59
like image 157
MaxU - stop WAR against UA Avatar answered Oct 30 '25 03:10

MaxU - stop WAR against UA


without using nlargest, by using sort_values

df.sort_values('C',ascending=False).iloc[:5,]

or using head

df.sort_values('C',ascending=False).head(5)

or using quantile

df[df.C>df.C.quantile(1-(5/len(df)))]
like image 32
BENY Avatar answered Oct 30 '25 05:10

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!