Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - how to drop nan rows within a group, but only if there's more than one row

For example, say I have a DataFrame that looks like this:

df1 = pd.DataFrame({
    "grp": ["a", "a", "a", "b", "b", "c", "c", "c", "d"],
    "col1": ["1", "2", np.nan, "4", "5", np.nan, "6", "7", np.nan]
})

    grp col1
0   a   1
1   a   2
2   a   NaN
3   b   4
4   b   5
5   c   NaN
6   c   6
7   c   7
8   d   NaN

For each group with the column named grp, I want to drop the rows where col1 is NaN.

The constraint is that I do not want to drop these rows when there's multiple rows within the group.

I would expect the output DataFrame to look like this.

df2 = pd.DataFrame({
    "grp": ["a", "a", "b", "b", "c", "c", "d"],
    "col1": ["1", "2", "4", "5", "6", "7", np.nan]
})

# notice the NaN in `grp`=="d"

    grp col1
0   a   1
1   a   2
2   b   4
3   b   5
4   c   6
5   c   7
6   d   NaN

I managed to come up with a solution, but it's clunky. Is there a more succinct way of solving this? I also don't understand why the values were cast to strings...

df1_grp = df1.groupby("grp")['col1'].apply(np.hstack).to_frame().reset_index()
df1_grp['col1'] = df1_grp['col1'].apply(lambda x: [float(_) for _ in x if _!="nan"] if len(x)>1 else x)
df1_grp.explode('col1')
like image 423
Ian Avatar asked Dec 30 '22 13:12

Ian


1 Answers

Use GroupBy.transform with GroupBy.all for test if all values of group is NaN and then chain inverted mask by | by & for bitwise OR:

m = df1['col1'].isna()
m1 = m.groupby(df1["grp"]).transform('all')

df = df1[~m | m1]
print (df)
  grp col1
0   a    1
1   a    2
3   b    4
4   b    5
6   c    6
7   c    7
8   d  NaN

Or you can filter groups with only missing values:

m = df1['col1'].notna()
m1 = df1['grp'].isin(df1.loc[m, 'grp'])


df = df1[m | ~m1]
print (df)
  grp col1
0   a    1
1   a    2
3   b    4
4   b    5
6   c    6
7   c    7
8   d  NaN
like image 134
jezrael Avatar answered Jan 02 '23 04:01

jezrael