Getting top 3 rows that have biggest sum of columns in `pandas.DataFrame`?

Tags:

Here is my pandas.DataFrame:

        day1   day2   day3
Apple     40     13     98
Orange    32     45     56
Banana    56     76     87
Pineapple 12     19     12
Grape     89     45     67

I want to create a new DataFrame that will contains top 3 fruits that have biggest sum of three days.

Sum of apple for three days -- 151, orange -- 133, banana -- 219, Pineapple -- 43, grape -- 201.
So the top 3 fruits is: 1)banana; 2)grape; 3)apple.

Here is an expected output:

        day1   day2   day3
Banana    56     76     87
Grape     89     45     67
Apple     40     13     98

How can I do that with pandas.DataFrame?

Thank you!

916

asked Dec 09 '13 20:12

Michael

1 Answers

Here's how you get the indices for the top 3 days by sum:

In [1]: df.sum(axis=1).order(ascending=False).head(3)
Out[1]:
Banana    219
Grape     201
Apple     151

And you can use that index to reference your original datset:

In [2]: idx = df.sum(axis=1).order(ascending=False).head(3).index

In [3]: df.ix[idx]
Out[3]:
        day1  day2  day3
Banana    56    76    87
Grape     89    45    67
Apple     40    13    98

[EDIT]

order() is now deprecated. sort_values() can be used here.

df.sum(axis=1).sort_values(ascending=False).head(3)

108

answered Sep 22 '22 09:09

Zelazny7

Related questions
                            
                                Using object as key in dictionary in Python - Hash function
                            
                                Why is my python output delayed to the end of the program?
                            
                                In Python how do I create variable length combinations or permutations?
                            
                                how to make rug plot in matplotlib
                            
                                Change the ticklabel orientation and legend position of plot
                            
                                How do I delete a row in a numpy array which contains a zero? [duplicate]
                            
                                Output binary data from CGI in Python 3
                            
                                os.path.split, changing file name with out compromising the Path
                            
                                The python's argparse errors
                            
                                How can I list all foreign key related objects in Django admin panel?
                            
                                Generate zip stream without using temp files
                            
                                Python: Writing to and Reading from serial port
                            
                                subprocess not working in Python
                            
                                Issues with generating a defaultdict with a deque
                            
                                Django JavaScript translation not working
                            
                                Add legends to LineCollection plot
                            
                                Cannot import GeoIP module in Django
                            
                                What's the difference between pass and continue in python [duplicate]
                            
                                Resample daily pandas timeseries with start at time other than midnight [duplicate]
                            
                                Append tuples to a tuples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting top 3 rows that have biggest sum of columns in `pandas.DataFrame`?

Tags:

python

pandas

dataframe

Michael

People also ask

1 Answers

Zelazny7

Recent Activity

Donate For Us