Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sum by groupby, but exclude certain columns

What is the best way to do a groupby on a Pandas dataframe, but exclude some columns from that groupby? e.g. I have the following dataframe:

Code   Country      Item_Code   Item    Ele_Code    Unit    Y1961    Y1962   Y1963 2      Afghanistan  15          Wheat   5312        Ha      10       20      30 2      Afghanistan  25          Maize   5312        Ha      10       20      30 4      Angola       15          Wheat   7312        Ha      30       40      50 4      Angola       25          Maize   7312        Ha      30       40      50 

I want to groupby the column Country and Item_Code and only compute the sum of the rows falling under the columns Y1961, Y1962 and Y1963. The resulting dataframe should look like this:

Code   Country      Item_Code   Item    Ele_Code    Unit    Y1961    Y1962   Y1963 2      Afghanistan  15          C3      5312        Ha      20       40       60 4      Angola       25          C4      7312        Ha      60       80      100 

Right now I am doing this:

df.groupby('Country').sum() 

However this adds up the values in the Item_Code column as well. Is there any way I can specify which columns to include in the sum() operation and which ones to exclude?

like image 632
user308827 Avatar asked Sep 23 '15 23:09

user308827


People also ask

How do I sum only certain columns in pandas?

To sum given or list of columns then create a list with all columns you wanted and slice the DataFrame with the selected list of columns and use the sum() function. Use df['Sum']=df[col_list]. sum(axis=1) to get the total sum.

How do I group certain columns in Python?

You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .

How do you group by one column and sum another panda?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.


1 Answers

You can select the columns of a groupby:

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum() Out[11]:                        Y1961  Y1962  Y1963 Country     Item_Code Afghanistan 15            10     20     30             25            10     20     30 Angola      15            30     40     50             25            30     40     50 

Note that the list passed must be a subset of the columns otherwise you'll see a KeyError.

like image 118
Andy Hayden Avatar answered Sep 28 '22 22:09

Andy Hayden