Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean and standard deviation with multiple dataframes

Tags:

python

pandas

I have multiple dataframes having the same columns and the same number of observations: For example

d1 = {'ID': ['A','B','C','D'], 'Amount': 
    [1,2,3,4]}
df1 =pd.DataFrame(data=d1)

d2 = {'ID': ['A','B','C','D'], 'Amount': 
    [6,0,1,5]}
df2 =pd.DataFrame(data=d2)

d3 = {'ID': ['A','B','C','D'], 'Amount': 
    [8,1,2,3]}
df3 =pd.DataFrame(data=d3)

I need to drop one column (D) and its corresponding value in each of the dataframes and then, for each variable, calculating the mean and standard deviation. The expected output should be

  avg   std
A   5    ...
B  ...   ...
C  ...   ...

Generally, for one dataframe, I would use drop columns and then I would compute the average using mean() and the standard deviation std().
How can I do this in an easy and fast way with multiple dataframes? (I have at least 10 of them).

like image 217
LdM Avatar asked May 06 '26 03:05

LdM


1 Answers

Use concat with remove D in DataFrame.query and aggregate by GroupBy.agg with named aggregations:

df = (pd.concat([df1, df2, df3])
        .query('ID != "D"')
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
    avg       std
ID               
A     5  3.605551
B     1  1.000000
C     2  1.000000

Or remove D in last step by DataFrame.drop:

df = (pd.concat([df1, df2, df3])
        .groupby('ID')
        .agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
        .drop('D'))
like image 186
jezrael Avatar answered May 09 '26 14:05

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!