Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas mean() for multiindex

Tags:

python

pandas

I have df:

CU           Parameters           1       2       3
379-H   Output Energy, (Wh/h)   0.045   0.055   0.042
349-J   Output Energy, (Wh/h)   0.001   0.003   0
625-H   Output Energy, (Wh/h)   2.695   1.224   1.272
626-F   Output Energy, (Wh/h)   1.381   1.494   1.3

I would like to create two separate dfs, getting the mean of column values by grouping index on level 0 (CU):

df1: (379-H and 625-H)

Parameters                1     2      3
Output Energy, (Wh/h)    1.37   0.63   0.657

df2: (the rest)

Parameters                 1     2      3
Output Energy, (Wh/h)     0.69  0.74   0.65

I can get the mean for all using by grouping level 1:

df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all').groupby(level=1).mean()

but how do I group these according to level 0?

SOLUTION:

lightsonly = ["379-H", "625-H"]
df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all')
mask = df.index.get_level_values(0).isin(lightsonly)
df1 = df[mask].groupby(level=1).mean()
df2 = df[~mask].groupby(level=1).mean()
like image 367
warrenfitzhenry Avatar asked Sep 06 '25 02:09

warrenfitzhenry


1 Answers

Use get_level_values + isin for True and False index and then get mean with rename by dict:

d = {True: '379-H and 625-H', False: 'the rest'}
df.index = df.index.get_level_values(0).isin(['379-H', '625-H'])
df = df.mean(level=0).rename(d)
print (df)
                     1       2      3
the rest         0.691  0.7485  0.650
379-H and 625-H  1.370  0.6395  0.657

For separately dfs is possible also use boolean indexing:

mask= df.index.get_level_values(0).isin(['379-H', '625-H'])

df1 = df[mask].mean().rename('379-H and 625-H').to_frame().T
print (df1)
                    1       2      3
379-H and 625-H  1.37  0.6395  0.657

df2 = df[~mask].mean().rename('the rest').to_frame().T
print (df2)
              1       2     3
the rest  0.691  0.7485  0.65

Another numpy solution with DataFrame constructor:

a1 = df[mask].values.mean(axis=0)
#alternatively
#a1 = df.values[mask].mean(axis=0)
df1 = pd.DataFrame(a1.reshape(-1, len(a1)), index=['379-H and 625-H'], columns=df.columns)
print (df1)
                    1       2      3
379-H and 625-H  1.37  0.6395  0.657
like image 73
jezrael Avatar answered Sep 11 '25 01:09

jezrael