I have multiple dataframes having the same columns and the same number of observations: For example
d1 = {'ID': ['A','B','C','D'], 'Amount':
[1,2,3,4]}
df1 =pd.DataFrame(data=d1)
d2 = {'ID': ['A','B','C','D'], 'Amount':
[6,0,1,5]}
df2 =pd.DataFrame(data=d2)
d3 = {'ID': ['A','B','C','D'], 'Amount':
[8,1,2,3]}
df3 =pd.DataFrame(data=d3)
I need to drop one column (D) and its corresponding value in each of the dataframes and then, for each variable, calculating the mean and standard deviation. The expected output should be
avg std
A 5 ...
B ... ...
C ... ...
Generally, for one dataframe, I would use drop columns and then I would compute the average using mean() and the standard deviation std().
How can I do this in an easy and fast way with multiple dataframes? (I have at least 10 of them).
Use concat with remove D in DataFrame.query and aggregate by GroupBy.agg with named aggregations:
df = (pd.concat([df1, df2, df3])
.query('ID != "D"')
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std')))
print (df)
avg std
ID
A 5 3.605551
B 1 1.000000
C 2 1.000000
Or remove D in last step by DataFrame.drop:
df = (pd.concat([df1, df2, df3])
.groupby('ID')
.agg(avg=('Amount', 'mean'), std=('Amount', 'std'))
.drop('D'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With