So for generalized purposes of approach:
df= [['A' , 'B']] (dataframe is actually bigger but for simplicity)
SC = df[['A','B']].groupby('A').agg({'B': ['mean', 'std']})
I'm trying to get the output of this back into original df
so output can be:
df=[['A','B','mean of B','std of B']]
I tried: pd.merge(df, SC, on=None)
,
got error:
"MergeError: No common columns to perform merge on"
Any help would be greatly appreciated, simplistically if possible.
Thank you
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Merge Default Pandas DataFrame Without Any Key Column You can pass two DataFrame to be merged to the pandas. merge() method. This collects all common columns in both DataFrames and replaces each common column in both DataFrame with a single one. It merges the DataFrames df and df1 assigns to merged_df .
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
One solution is to perform two groupby.transform
calculations:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 3, (50, 2)), columns=['A', 'B'])
df['mean'] = df.groupby('A')['B'].transform('mean')
df['std'] = df.groupby('A')['B'].transform('std')
print(df.head())
A B mean std
0 0 2 0.866667 0.915475
1 2 2 1.187500 0.910586
2 1 1 0.947368 0.911268
3 1 0 0.947368 0.911268
4 0 2 0.866667 0.915475
Alternatively, you can perform a single groupby
aggregation. Then align indices and merge:
# grp dataframe is indexed by A
grp = df.groupby('A')['B'].agg(['mean', 'std'])
# set same index for df, perform merge on indices, then reset index
res = df.set_index('A')\
.merge(grp, left_index=True, right_index=True)\
.reset_index()
I think you just have to specify the columns to merge on:
df.merge(SC, left_on = 'A', right_index=True)
Example:
# Original Dataframe (randomly created):
>>> df
A B
0 b 8
1 a 8
2 a 1
3 b 9
4 b 2
5 b 9
6 b 4
7 a 9
8 a 0
9 b 8
# The result of your "SC" object created by groupby and agg
>>> SC
B
mean std
A
a 4.500000 4.654747
b 6.666667 2.943920
# Merge them together on the appropriate columns:
>>> df.merge(SC, left_on = 'A', right_index=True)
A B (B, mean) (B, std)
0 b 8 6.666667 2.943920
3 b 9 6.666667 2.943920
4 b 2 6.666667 2.943920
5 b 9 6.666667 2.943920
6 b 4 6.666667 2.943920
9 b 8 6.666667 2.943920
1 a 8 4.500000 4.654747
2 a 1 4.500000 4.654747
7 a 9 4.500000 4.654747
8 a 0 4.500000 4.654747
If you want, you can get your merged dataframe in the original order just by adding .sort_index
:
df.merge(SC, left_on = 'A', right_index=True).sort_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With