Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge rows in DataFrame by removing nan's after groupby

Give a DataFrame like the following:

import numpy as np
import pandas as pd
from pandas import DataFrame

idx = pd.MultiIndex.from_product([["Project 1", "Project 2"], range(1,3)],
                         names=['Project', 'Ord'])
df = DataFrame({'a': ["foo", np.nan, np.nan, "bar"],
                'b': [np.nan, "one", "two", np.nan]},
               index=idx)

Out:

                 a    b
Project   Ord          
Project 1 1    foo  NaN
          2    NaN  one
Project 2 1    NaN  two
          2    bar  NaN

I would like to merge the rows with the same outer index (note that in each case there is only one non nan-value).

The current solution I have involves two groupby-operations:

df.index = df.index.droplevel(1)
df.groupby(df.index).ffill().groupby(df.index).last()

and gives me the intended result:

Out:
             a    b
Project            
Project 1  foo  one
Project 2  bar  two

Having to use two groupby-operations seems excessive, since all I need is an aggregation-function that returns the single non-nan value from a list. However, I cannot think of a way to use dropna as an aggregation-function.

like image 719
Patrick Allo Avatar asked Jan 06 '23 03:01

Patrick Allo


2 Answers

you can use reset_index, stack and unstack:

In [131]: df.reset_index(level=1, drop=True).stack().unstack()
Out[131]:
             a    b
Project
Project 1  foo  one
Project 2  bar  two
like image 77
MaxU - stop WAR against UA Avatar answered Jan 08 '23 09:01

MaxU - stop WAR against UA


the last method on groupby grabs the the last valid value. first would accomplish the same thing in this case.

df.groupby(level='Project').last()

enter image description here

like image 41
piRSquared Avatar answered Jan 08 '23 11:01

piRSquared