Element-wise average and standard deviation across multiple dataframes

Question

Data: Multiple dataframes of the same format (same columns, an equal number of rows, and no points missing).

How do I create a "summary" dataframe that contains an element-wise mean for every element? How about a dataframe that contains an element-wise standard deviation?

          A         B         C
0 -1.624722 -1.160731  0.016726
1 -1.565694  0.989333  1.040820
2 -0.484945  0.718596 -0.180779
3  0.388798 -0.997036  1.211787
4 -0.249211  1.604280 -1.100980
5  0.062425  0.925813 -1.810696
6  0.793244 -1.860442 -1.196797

          A         B         C
0  1.016386  1.766780  0.648333
1 -1.101329 -1.021171  0.830281
2 -1.133889 -2.793579  0.839298
3  1.134425  0.611480 -1.482724
4 -0.066601 -2.123353  1.136564
5 -0.167580 -0.991550  0.660508
6  0.528789 -0.483008  1.472787

Alexander · Accepted Answer

You can create a panel of your DataFrames and then compute the mean and SD along the items axis:

df1 = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'])
df2 = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'])
df3 = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'])
p = pd.Panel({n: df for n, df in enumerate([df1, df2, df3])})

>>> p.mean(axis=0)
          A         B         C
0 -0.024284 -0.622337  0.581292
1  0.186271  0.596634 -0.498755
2  0.084591 -0.760567 -0.334429
3 -0.833688  0.403628  0.013497
4  0.402502 -0.017670 -0.369559
5  0.733305 -1.311827  0.463770
6 -0.941334  0.843020 -1.366963
7  0.134700  0.626846  0.994085
8 -0.783517  0.703030 -1.187082
9 -0.954325  0.514671 -0.370741

>>> p.std(axis=0)
          A         B         C
0  0.196526  1.870115  0.503855
1  0.719534  0.264991  1.232129
2  0.315741  0.773699  1.328869
3  1.169213  1.488852  1.149105
4  1.416236  1.157386  0.414532
5  0.554604  1.022169  1.324711
6  0.178940  1.107710  0.885941
7  1.270448  1.023748  1.102772
8  0.957550  0.355523  1.284814
9  0.582288  0.997909  1.566383

JohnE · Answer

One simple solution here is to simply concatenate the existing dataframes into a single dataframe while adding an ID variable to track the original source:

dfa = pd.DataFrame( np.random.randn(2,2), columns=['a','b'] ).assign(id='a')
dfb = pd.DataFrame( np.random.randn(2,2), columns=['a','b'] ).assign(id='b')

df = pd.concat([df1,df2])

          a         b id
0 -0.542652  1.609213  a
1 -0.192136  0.458564  a
0 -0.231949 -0.000573  b
1  0.245715 -0.083786  b

So now you have two 2x2 dataframes combined into a single 4x2 dataframe. The 'id' columns identifies the source dataframe so you haven't lost any generality, and can select on 'id' to do the same thing you would to any single dataframe. E.g. df[ df['id'] == 'a' ].

But now you can also use groupby to do any pandas method such as mean() or std() on an element by element basis:

df.groupby('id').mean()

              a         b
index                    
0      0.198164 -0.811475
1      0.639529  0.812810

Element-wise average and standard deviation across multiple dataframes

Tags:

python

python-3.x

pandas

Alealeale

2 Answers

Alexander

JohnE

Recent Activity

Donate For Us

Element-wise average and standard deviation across multiple dataframes

Tags:

python

python-3.x

pandas

Alealeale

2 Answers

Alexander

JohnE

Related questions

Recent Activity

Donate For Us