Here is my pandas.DataFrame:
        day1   day2   day3
Apple     40     13     98
Orange    32     45     56
Banana    56     76     87
Pineapple 12     19     12
Grape     89     45     67
I want to create a new DataFrame that will contains top 3 fruits that have biggest sum of three days. 
Sum of apple for three days -- 151, orange -- 133, banana -- 219, Pineapple -- 43, grape -- 201.
So the top 3 fruits is: 1)banana; 2)grape; 3)apple.
Here is an expected output:
        day1   day2   day3
Banana    56     76     87
Grape     89     45     67
Apple     40     13     98
How can I do that with pandas.DataFrame?
Thank you!
To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.
pandas.DataFrame.head() In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe.
Here's how you get the indices for the top 3 days by sum:
In [1]: df.sum(axis=1).order(ascending=False).head(3)
Out[1]:
Banana    219
Grape     201
Apple     151
And you can use that index to reference your original datset:
In [2]: idx = df.sum(axis=1).order(ascending=False).head(3).index
In [3]: df.ix[idx]
Out[3]:
        day1  day2  day3
Banana    56    76    87
Grape     89    45    67
Apple     40    13    98
[EDIT]
order() is now deprecated. sort_values() can be used here.
df.sum(axis=1).sort_values(ascending=False).head(3)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With