Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop NaN elements in a groupby on a pandas dataframe?

Suppose I have this dataframe :

my_df = pd.DataFrame({'A':[np.nan,np.nan,'gate','ball'],'B':['car',np.nan,np.nan,np.nan],'C':[np.nan,'edge',np.nan,np.nan],'D':['id1','id1','id1','id2']})

In [176]: my_df
Out[176]:
  A    B     C    D
0   NaN  car   NaN  id1
1   NaN  NaN  edge  id1
2  gate  NaN   NaN  id1
3  ball  NaN   NaN  id2

I want to group by column D and to ignore the NaN. Expected output :

        A    B     C
D
id1  gate  car  edge
id2  ball  NaN  NaN

My solution would be to fill NaN with empty char and to take the max:

In [177]: my_df.fillna("").groupby("D").max()
Out[177]:
    A    B     C
D
id1  gate  car  edge
id2  ball

Is there another solution without fillna("")?

like image 244
G F Avatar asked Dec 23 '22 12:12

G F


2 Answers

Use custom function with dropna, but for empty values add NaNs:

print (my_df.groupby("D").agg(lambda x: np.nan if x.isnull().all() else x.dropna()))
        A    B     C
D                   
id1  gate  car  edge
id2  ball  NaN   NaN

Similar solution with custom function:

def f(x):
    y = x.dropna()
    return np.nan if y.empty else y

print (my_df.groupby("D").agg(f))
        A    B     C
D                   
id1  gate  car  edge
id2  ball  NaN   NaN
like image 98
jezrael Avatar answered Dec 28 '22 07:12

jezrael


Your approach is much better I guess but add a replace at the end

my_df.fillna("").groupby("D").max().replace('',np.nan) 

You can also do :

def get_notnull(x):
    if x.notnull().any():
        return  x[x.notnull()]
    else:
        return np.nan

my_df.groupby('D').agg(get_notnull)

Output :

     A    B     C
D                   
id1  gate  car  edge
id2  ball  NaN   NaN
like image 23
Bharath Avatar answered Dec 28 '22 07:12

Bharath