Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Creating aggregated column in DataFrame

Tags:

python

pandas

With the DataFrame below as an example,

In [83]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)}) df Out[83]:    A  B  values 0  1  1      10 1  1  2      15 2  2  1      20 3  2  2      25 

What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?

For example, if I sum values over items in A

In [84]: df.groupby('A').sum()['values'] Out[84]: A 1    25 2    45 Name: values 

How can I get

   A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45 
like image 550
foglerit Avatar asked Nov 06 '12 18:11

foglerit


People also ask

How do you create aggregate in a data frame?

Use pandas DataFrame. aggregate() function to calculate any aggregations on the selected columns of DataFrame and apply multiple aggregations at the same time. The below example df[['Fee','Discount']] returns a DataFrame with two columns and aggregate('sum') returns the sum for each column.

How do you aggregate data in a DataFrame in Python?

The aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.

How do I Group A column in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .


2 Answers

In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})  In [21]: df Out[21]:    A  B  values 0  1  1      10 1  1  2      15 2  2  1      20 3  2  2      25  In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)  In [23]: df Out[23]:    A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45 
like image 108
Wouter Overmeire Avatar answered Sep 21 '22 04:09

Wouter Overmeire


I found a way using join:

In [101]: aggregated = df.groupby('A').sum()['values'] aggregated.name = 'sum_values_A' df.join(aggregated,on='A')  Out[101]:    A  B  values  sum_values_A 0  1  1      10            25 1  1  2      15            25 2  2  1      20            45 3  2  2      25            45 

Anyone has a simpler way to do it?

like image 27
foglerit Avatar answered Sep 21 '22 04:09

foglerit