Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge an aggregate output back to original dataframe

So for generalized purposes of approach:

df= [['A' , 'B']] (dataframe is actually bigger but for simplicity)

SC = df[['A','B']].groupby('A').agg({'B': ['mean', 'std']})

I'm trying to get the output of this back into original df so output can be:

df=[['A','B','mean of B','std of B']]

I tried: pd.merge(df, SC, on=None), got error:

"MergeError: No common columns to perform merge on"

Any help would be greatly appreciated, simplistically if possible.

Thank you

like image 601
konsama Avatar asked Jun 02 '18 22:06

konsama


People also ask

How do I merge Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do I merge columns into data frames?

Merge Default Pandas DataFrame Without Any Key Column You can pass two DataFrame to be merged to the pandas. merge() method. This collects all common columns in both DataFrames and replaces each common column in both DataFrame with a single one. It merges the DataFrames df and df1 assigns to merged_df .

How do I merge two columns into a DataFrame in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


2 Answers

groupby transform

One solution is to perform two groupby.transform calculations:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 3, (50, 2)), columns=['A', 'B'])

df['mean'] = df.groupby('A')['B'].transform('mean')
df['std'] = df.groupby('A')['B'].transform('std')

print(df.head())

   A  B      mean       std
0  0  2  0.866667  0.915475
1  2  2  1.187500  0.910586
2  1  1  0.947368  0.911268
3  1  0  0.947368  0.911268
4  0  2  0.866667  0.915475

groupby agg + merge

Alternatively, you can perform a single groupby aggregation. Then align indices and merge:

# grp dataframe is indexed by A
grp = df.groupby('A')['B'].agg(['mean', 'std'])

# set same index for df, perform merge on indices, then reset index
res = df.set_index('A')\
        .merge(grp, left_index=True, right_index=True)\
        .reset_index()
like image 80
jpp Avatar answered Oct 13 '22 00:10

jpp


I think you just have to specify the columns to merge on:

df.merge(SC, left_on = 'A', right_index=True)

Example:

# Original Dataframe (randomly created):
>>> df
   A  B
0  b  8
1  a  8
2  a  1
3  b  9
4  b  2
5  b  9
6  b  4
7  a  9
8  a  0
9  b  8

# The result of your "SC" object created by groupby and agg
>>> SC
          B          
       mean       std
A                    
a  4.500000  4.654747
b  6.666667  2.943920

# Merge them together on the appropriate columns:
>>> df.merge(SC, left_on = 'A', right_index=True)
   A  B  (B, mean)  (B, std)
0  b  8   6.666667  2.943920
3  b  9   6.666667  2.943920
4  b  2   6.666667  2.943920
5  b  9   6.666667  2.943920
6  b  4   6.666667  2.943920
9  b  8   6.666667  2.943920
1  a  8   4.500000  4.654747
2  a  1   4.500000  4.654747
7  a  9   4.500000  4.654747
8  a  0   4.500000  4.654747

If you want, you can get your merged dataframe in the original order just by adding .sort_index:

df.merge(SC, left_on = 'A', right_index=True).sort_index()
like image 29
sacuL Avatar answered Oct 13 '22 01:10

sacuL