With the DataFrame below as an example,
In [83]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)}) df Out[83]:    A  B  values 0  1  1      10 1  1  2      15 2  2  1      20 3  2  2      25   What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?
For example, if I sum values over items in A 
In [84]: df.groupby('A').sum()['values'] Out[84]: A 1    25 2    45 Name: values   How can I get
   A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45 
                Use pandas DataFrame. aggregate() function to calculate any aggregations on the selected columns of DataFrame and apply multiple aggregations at the same time. The below example df[['Fee','Discount']] returns a DataFrame with two columns and aggregate('sum') returns the sum for each column.
The aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})  In [21]: df Out[21]:    A  B  values 0  1  1      10 1  1  2      15 2  2  1      20 3  2  2      25  In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)  In [23]: df Out[23]:    A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45 
                        I found a way using join:
In [101]: aggregated = df.groupby('A').sum()['values'] aggregated.name = 'sum_values_A' df.join(aggregated,on='A')  Out[101]:    A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45   Anyone has a simpler way to do it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With