I have a dataframe where I want to return the full row that contains the largest values out of a specified column. So let's say I create a dataframe like this:
df = pd.DataFrame(np.random.randint(0,100,size=(25, 4)), columns=list('ABCD'))
Then I'd have a table like this (sorry I can't get a proper table to form, so I just made a short one up):
A B C D
14 67 35 22
75 21 34 64
And let's say it goes on for 25 rows like that. I want to take the top 5 largest values of column C and return those full rows.
If I do:
df['C'].nlargest()
it returns those 5 largest values, but I want it to return the full row.
I thought the below would work, but it gives me an error of "IndexError: indices are out-of-bounds":
df[df['C'].nlargest()]
I know this will be an easy solution for many people here, but it's stumped me. Thanks for your help.
you want to use columns parameter:
In [53]: df.nlargest(5, columns=['C'])
Out[53]:
A B C D
17 43 91 95 32
18 13 36 81 56
7 61 90 76 85
16 68 21 73 68
14 3 64 71 59
without using nlargest, by using sort_values
df.sort_values('C',ascending=False).iloc[:5,]
or using head
df.sort_values('C',ascending=False).head(5)
or using quantile
df[df.C>df.C.quantile(1-(5/len(df)))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With