With the DataFrame below as an example,
In [83]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)}) df Out[83]: A B values 0 1 1 10 1 1 2 15 2 2 1 20 3 2 2 25
What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?
For example, if I sum values
over items in A
In [84]: df.groupby('A').sum()['values'] Out[84]: A 1 25 2 45 Name: values
How can I get
A B values sum_values_A 0 1 1 10 25 1 1 2 15 25 2 2 1 20 45 3 2 2 25 45
Use pandas DataFrame. aggregate() function to calculate any aggregations on the selected columns of DataFrame and apply multiple aggregations at the same time. The below example df[['Fee','Discount']] returns a DataFrame with two columns and aggregate('sum') returns the sum for each column.
The aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)}) In [21]: df Out[21]: A B values 0 1 1 10 1 1 2 15 2 2 1 20 3 2 2 25 In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum) In [23]: df Out[23]: A B values sum_values_A 0 1 1 10 25 1 1 2 15 25 2 2 1 20 45 3 2 2 25 45
I found a way using join
:
In [101]: aggregated = df.groupby('A').sum()['values'] aggregated.name = 'sum_values_A' df.join(aggregated,on='A') Out[101]: A B values sum_values_A 0 1 1 10 25 1 1 2 15 25 2 2 1 20 45 3 2 2 25 45
Anyone has a simpler way to do it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With