I have a DataFrame with column multi-index:
System A B
Trial Exp1 Exp2 Exp1 Exp2
1 NaN 1 2 3
2 4 5 NaN NaN
3 6 NaN 7 8
Turns out for each system (A, B
) and each measurement (1, 2, 3
in index), results from Exp1
is always superior to Exp2
. So I want to generate a 3rd column for each system, call it Final
, that should take Exp1
whenever available, and default to Exp2
otherwise. The desired result is
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1 1 2 3 2
2 4 5 4 NaN NaN NaN
3 6 NaN 6 7 8 7
What is the best way to do this?
I've tried to use groupby
on the columns:
grp = df.groupby(level=0, axis=1)
And was thinking of using either transform
or apply
combined by assign
to achieve it. But am not able to find either a working or an efficient way of doing it. Specifically I am avoiding native python for
loops for efficiency reasons (else the problem is trivial).
Use stack
for reshape, add column with fillna
and then reshape back by unstack
with swaplevel
+ sort_index
:
df = df.stack(level=0)
df['Final'] = df['Exp1'].fillna(df['Exp1'])
df = df.unstack().swaplevel(0,1,axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 NaN 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
Another solution with xs
for select DataFrames
, create new DataFrame
by combine_first
, but there is missing second level - was added by MultiIndex.from_product
and last concat
both DataFrames
together:
a = df.xs('Exp1', axis=1, level=1)
b = df.xs('Exp2', axis=1, level=1)
df1 = a.combine_first(b)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['Final']])
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
Similar solution with rename
:
a = df.xs('Exp1', axis=1, level=1, drop_level=False)
b = df.xs('Exp2', axis=1, level=1, drop_level=False)
df1 = a.rename(columns={'Exp1':'Final'}).combine_first(b.rename(columns={'Exp2':'Final'}))
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
stack
with your first level of the column index stack(0)
leaving ['Exp1', 'Exp2']
in the column indexlambda
function that gets applied to the whole dataframe within an assign
call.unstack
, swaplevel
, sort_index
to clean it up and put everything where it belongs.f = lambda x: x.Exp1.fillna(x.Exp2)
df.stack(0).assign(Final=f).unstack() \
.swaplevel(0, 1, 1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
Another concept using xs
d1 = df.xs('Exp1', 1, 1).fillna(df.xs('Exp2', 1, 1))
d1.columns = [d1.columns, ['Final'] * len(d1.columns)]
pd.concat([df, d1], axis=1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With