So for generalized purposes of approach: <pre class="prettyprint"><code>df= [['A' , 'B']] (dataframe is actually bigger but for simplicity) SC = df[['A','B']].groupby('A').agg({'B': ['mean', 'std']}) </code></pre> I'm trying to get the output of this back into original <code>df</code> so output can be: <pre class="prettyprint"><code>df=[['A','B','mean of B','std of B']] </code></pre> I tried: <code>pd.merge(df, SC, on=None)</code>, got error: <blockquote> "MergeError: No common columns to perform merge on" </blockquote> Any help would be greatly appreciated, simplistically if possible. Thank you

I think you just have to specify the columns to merge on: <pre class="prettyprint"><code>df.merge(SC, left_on = 'A', right_index=True) </code></pre> Example: <pre class="prettyprint"><code># Original Dataframe (randomly created): >>> df A B 0 b 8 1 a 8 2 a 1 3 b 9 4 b 2 5 b 9 6 b 4 7 a 9 8 a 0 9 b 8 # The result of your "SC" object created by groupby and agg >>> SC B mean std A a 4.500000 4.654747 b 6.666667 2.943920 # Merge them together on the appropriate columns: >>> df.merge(SC, left_on = 'A', right_index=True) A B (B, mean) (B, std) 0 b 8 6.666667 2.943920 3 b 9 6.666667 2.943920 4 b 2 6.666667 2.943920 5 b 9 6.666667 2.943920 6 b 4 6.666667 2.943920 9 b 8 6.666667 2.943920 1 a 8 4.500000 4.654747 2 a 1 4.500000 4.654747 7 a 9 4.500000 4.654747 8 a 0 4.500000 4.654747 </code></pre> If you want, you can get your merged dataframe in the original order just by adding <code>.sort_index</code>: <pre class="prettyprint"><code>df.merge(SC, left_on = 'A', right_index=True).sort_index() </code></pre>

How to merge an aggregate output back to original dataframe

df= [['A' , 'B']] (dataframe is actually bigger but for simplicity)

SC = df[['A','B']].groupby('A').agg({'B': ['mean', 'std']})

I'm trying to get the output of this back into original df so output can be:

Click to copy

df=[['A','B','mean of B','std of B']]

I tried: pd.merge(df, SC, on=None), got error:

"MergeError: No common columns to perform merge on"

Any help would be greatly appreciated, simplistically if possible.

Thank you

601

asked Jun 02 '18 22:06

konsama

2 Answers

groupby transform

One solution is to perform two groupby.transform calculations:

Click to copy

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 3, (50, 2)), columns=['A', 'B'])

df['mean'] = df.groupby('A')['B'].transform('mean')
df['std'] = df.groupby('A')['B'].transform('std')

print(df.head())

   A  B      mean       std
0  0  2  0.866667  0.915475
1  2  2  1.187500  0.910586
2  1  1  0.947368  0.911268
3  1  0  0.947368  0.911268
4  0  2  0.866667  0.915475

groupby agg + merge

Alternatively, you can perform a single groupby aggregation. Then align indices and merge:

Click to copy

# grp dataframe is indexed by A
grp = df.groupby('A')['B'].agg(['mean', 'std'])

# set same index for df, perform merge on indices, then reset index
res = df.set_index('A')\
        .merge(grp, left_index=True, right_index=True)\
        .reset_index()

answered Oct 13 '22 00:10

jpp

I think you just have to specify the columns to merge on:

Click to copy

df.merge(SC, left_on = 'A', right_index=True)

Example:

Click to copy

# Original Dataframe (randomly created):
>>> df
   A  B
0  b  8
1  a  8
2  a  1
3  b  9
4  b  2
5  b  9
6  b  4
7  a  9
8  a  0
9  b  8

# The result of your "SC" object created by groupby and agg
>>> SC
          B          
       mean       std
A                    
a  4.500000  4.654747
b  6.666667  2.943920

# Merge them together on the appropriate columns:
>>> df.merge(SC, left_on = 'A', right_index=True)
   A  B  (B, mean)  (B, std)
0  b  8   6.666667  2.943920
3  b  9   6.666667  2.943920
4  b  2   6.666667  2.943920
5  b  9   6.666667  2.943920
6  b  4   6.666667  2.943920
9  b  8   6.666667  2.943920
1  a  8   4.500000  4.654747
2  a  1   4.500000  4.654747
7  a  9   4.500000  4.654747
8  a  0   4.500000  4.654747

If you want, you can get your merged dataframe in the original order just by adding .sort_index:

Click to copy

df.merge(SC, left_on = 'A', right_index=True).sort_index()

answered Oct 13 '22 01:10

sacuL

Related questions
                            
                                Tensorflow Dataset .map() API
                            
                                Pulling random files out of a folder for sampling
                            
                                AttributeError: 'GMM' object has no attribute 'covariances_' || AttributeError: 'module' object has no attribute 'GaussianMixture'
                            
                                Rawpy: How to postprocess raw images WITHOUT adulterating pixel data?
                            
                                regex in django 2.0 re_path
                            
                                CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow
                            
                                substring multiple characters from the last index of a pyspark string column using negative indexing
                            
                                converting exponent or scientific number into integer in pandas python
                            
                                Alternative of send_file() in flask on Pythonanywhere?
                            
                                pandas: selecting rows in a specific time window
                            
                                Tensorflow flatten vs numpy flatten function effect on machine learning training
                            
                                Find coordinates of a Canny Edge Image - OpenCV & python
                            
                                pandas multiple date ranges from column of dates
                            
                                IPython magic print variables on assignment
                            
                                Spyder IDE complaining about unable to detect undefined names
                            
                                Finding two most far away points in plot with many points in Python
                            
                                How start start celery worker in Django project
                            
                                dask dataframe head() returns empty df
                            
                                Write multiple lines of text in a flow with reportlab
                            
                                Winsorizing data by column in pandas with NaN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to merge an aggregate output back to original dataframe

Tags:

python

pandas

dataframe

pandas-groupby

konsama

People also ask

2 Answers

groupby transform

groupby agg + merge

jpp

sacuL

Recent Activity

Donate For Us