Give a DataFrame like the following:
import numpy as np
import pandas as pd
from pandas import DataFrame
idx = pd.MultiIndex.from_product([["Project 1", "Project 2"], range(1,3)],
names=['Project', 'Ord'])
df = DataFrame({'a': ["foo", np.nan, np.nan, "bar"],
'b': [np.nan, "one", "two", np.nan]},
index=idx)
Out:
a b
Project Ord
Project 1 1 foo NaN
2 NaN one
Project 2 1 NaN two
2 bar NaN
I would like to merge the rows with the same outer index (note that in each case there is only one non nan-value).
The current solution I have involves two groupby-operations:
df.index = df.index.droplevel(1)
df.groupby(df.index).ffill().groupby(df.index).last()
and gives me the intended result:
Out:
a b
Project
Project 1 foo one
Project 2 bar two
Having to use two groupby-operations seems excessive, since all I need is an aggregation-function that returns the single non-nan value from a list. However, I cannot think of a way to use dropna as an aggregation-function.
you can use reset_index, stack and unstack:
In [131]: df.reset_index(level=1, drop=True).stack().unstack()
Out[131]:
a b
Project
Project 1 foo one
Project 2 bar two
the last
method on groupby
grabs the the last valid value. first
would accomplish the same thing in this case.
df.groupby(level='Project').last()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With