Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: how to sum by groupby value

Using this:

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

df.groupby(['Team',"Rank"]).sum()

This is returned.

             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

How you I extract values (Points) where rank equals '1', so 1544+ 876+ 804. and the same for rank equals 2, and 3.

like image 428
Merlin Avatar asked Apr 23 '18 13:04

Merlin


People also ask

How do you count after Groupby in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

What is possible using Groupby () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.

How do I sum multiple columns in pandas DataFrame?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.


3 Answers

I believe need DataFrame.xs:

print (df.xs(1, level=1))

        Points
Team          
Kings     1544
Riders     876
Royals     804

print (df.xs(2, level=1))

        Points
Team          
Devils     863
Riders    2173

For select by multiple criteria use slicers:

idx = pd.IndexSlice
print (df.loc[idx[:, [1,2]], :])

             Points
Team   Rank        
Devils 2        863
Kings  1       1544
Riders 1        876
       2       2173
Royals 1        804

print (df.loc[idx['Riders', [1,2]], :])

             Points
Team   Rank        
Riders 1        876
       2       2173

If want sum all groups by Ranks change grouping columns from ['Team',"Rank"] to Rank:

s = df.groupby("Rank")['Points'].sum()
print (s)
Rank
1    3224
2    3036
3    1414
4    1513
Name: Points, dtype: int64

If need also df1 then use sum per level=1:

df1 = df.groupby(['Team',"Rank"]).sum()
print (df1)
             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

s1 = df1.sum(level=1)
print (s1)
      Points
Rank        
2       3036
3       1414
1       3224
4       1513
like image 194
jezrael Avatar answered Sep 17 '22 01:09

jezrael


df[df['Rank'] == 1] # Filter by rank before summing
like image 29
JimPri Avatar answered Sep 19 '22 01:09

JimPri


You can reorder by the rank after summing:

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

result = df.groupby(['Team', 'Rank']).sum().swaplevel().sort_index()
# Or just:
result = df.groupby(['Rank', 'Team']).sum()

print(result)

Output:

Rank Team
1    Kings     1544
     Riders     876
     Royals     804
2    Devils     863
     Riders    2173
3    Devils     673
     Kings      741
4    Kings      812
     Royals     701
like image 37
jdehesa Avatar answered Sep 17 '22 01:09

jdehesa