I have the following python pandas data frame:
df = pd.DataFrame( { 'A': [1,1,1,1,2,2,2,3,3,4,4,4], 'B': [5,5,6,7,5,6,6,7,7,6,7,7], 'C': [1,1,1,1,1,1,1,1,1,1,1,1] } ); df A B C 0 1 5 1 1 1 5 1 2 1 6 1 3 1 7 1 4 2 5 1 5 2 6 1 6 2 6 1 7 3 7 1 8 3 7 1 9 4 6 1 10 4 7 1 11 4 7 1
I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:
A B C D 0 1 5 1 2 1 1 5 1 2 2 1 6 1 1 3 1 7 1 1 4 2 5 1 1 5 2 6 1 2 6 2 6 1 2 7 3 7 1 2 8 3 7 1 2 9 4 6 1 1 10 4 7 1 2 11 4 7 1 2
I have tried with pandas groupby
and it kind of works:
res = {} for a, group_by_A in df.groupby('A'): group_by_B = group_by_A.groupby('B', as_index = False) res[a] = group_by_B['C'].sum()
but I don't know how to 'get' the results from res
into df
in the orderly fashion. Would be very happy with any advice on this. Thank you.
An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data.
In this article, we'll see how we can display all the values of each group in which a dataframe is divided. The dataframe is first divided into groups using the DataFrame. groupby() method. Then we modify it such that each group contains the values in a list.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Here's one way (though it feels this should work in one go with an apply, I can't get it).
In [11]: g = df.groupby(['A', 'B']) In [12]: df1 = df.set_index(['A', 'B'])
The size
groupby function is the one you want, we have to match it to the 'A' and 'B' as the index:
In [13]: df1['D'] = g.size() # unfortunately this doesn't play nice with as_index=False # Same would work with g['C'].sum() In [14]: df1.reset_index() Out[14]: A B C D 0 1 5 1 2 1 1 5 1 2 2 1 6 1 1 3 1 7 1 1 4 2 5 1 1 5 2 6 1 2 6 2 6 1 2 7 3 7 1 2 8 3 7 1 2 9 4 6 1 1 10 4 7 1 2 11 4 7 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With