Suppose I have this dataframe :
my_df = pd.DataFrame({'A':[np.nan,np.nan,'gate','ball'],'B':['car',np.nan,np.nan,np.nan],'C':[np.nan,'edge',np.nan,np.nan],'D':['id1','id1','id1','id2']})
In [176]: my_df
Out[176]:
A B C D
0 NaN car NaN id1
1 NaN NaN edge id1
2 gate NaN NaN id1
3 ball NaN NaN id2
I want to group by column D
and to ignore the NaN
. Expected output :
A B C
D
id1 gate car edge
id2 ball NaN NaN
My solution would be to fill NaN
with empty char and to take the max
:
In [177]: my_df.fillna("").groupby("D").max()
Out[177]:
A B C
D
id1 gate car edge
id2 ball
Is there another solution without fillna("")
?
Use custom function with dropna
, but for empty values add NaN
s:
print (my_df.groupby("D").agg(lambda x: np.nan if x.isnull().all() else x.dropna()))
A B C
D
id1 gate car edge
id2 ball NaN NaN
Similar solution with custom function:
def f(x):
y = x.dropna()
return np.nan if y.empty else y
print (my_df.groupby("D").agg(f))
A B C
D
id1 gate car edge
id2 ball NaN NaN
Your approach is much better I guess but add a replace at the end
my_df.fillna("").groupby("D").max().replace('',np.nan)
You can also do :
def get_notnull(x):
if x.notnull().any():
return x[x.notnull()]
else:
return np.nan
my_df.groupby('D').agg(get_notnull)
Output :
A B C
D
id1 gate car edge
id2 ball NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With