Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas groupby() result

Tags:

I have the following python pandas data frame:

df = pd.DataFrame( {    'A': [1,1,1,1,2,2,2,3,3,4,4,4],    'B': [5,5,6,7,5,6,6,7,7,6,7,7],    'C': [1,1,1,1,1,1,1,1,1,1,1,1]     } );  df     A  B  C 0   1  5  1 1   1  5  1 2   1  6  1 3   1  7  1 4   2  5  1 5   2  6  1 6   2  6  1 7   3  7  1 8   3  7  1 9   4  6  1 10  4  7  1 11  4  7  1 

I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:

    A  B  C  D 0   1  5  1  2 1   1  5  1  2 2   1  6  1  1 3   1  7  1  1 4   2  5  1  1 5   2  6  1  2 6   2  6  1  2 7   3  7  1  2 8   3  7  1  2 9   4  6  1  1 10  4  7  1  2 11  4  7  1  2 

I have tried with pandas groupby and it kind of works:

res = {} for a, group_by_A in df.groupby('A'):     group_by_B = group_by_A.groupby('B', as_index = False)     res[a] = group_by_B['C'].sum() 

but I don't know how to 'get' the results from res into df in the orderly fashion. Would be very happy with any advice on this. Thank you.

like image 698
Simon Righley Avatar asked Jul 16 '13 00:07

Simon Righley


People also ask

What does Groupby in pandas return?

An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data.

How do you get Groupby value in pandas?

In this article, we'll see how we can display all the values of each group in which a dataframe is divided. The dataframe is first divided into groups using the DataFrame. groupby() method. Then we modify it such that each group contains the values in a list.

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

What does Groupby sum return?

groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.


1 Answers

Here's one way (though it feels this should work in one go with an apply, I can't get it).

In [11]: g = df.groupby(['A', 'B'])  In [12]: df1 = df.set_index(['A', 'B']) 

The size groupby function is the one you want, we have to match it to the 'A' and 'B' as the index:

In [13]: df1['D'] = g.size()  # unfortunately this doesn't play nice with as_index=False # Same would work with g['C'].sum()  In [14]: df1.reset_index() Out[14]:     A  B  C  D 0   1  5  1  2 1   1  5  1  2 2   1  6  1  1 3   1  7  1  1 4   2  5  1  1 5   2  6  1  2 6   2  6  1  2 7   3  7  1  2 8   3  7  1  2 9   4  6  1  1 10  4  7  1  2 11  4  7  1  2 
like image 113
Andy Hayden Avatar answered Dec 19 '22 14:12

Andy Hayden