For example, say I have a DataFrame that looks like this:
df1 = pd.DataFrame({
"grp": ["a", "a", "a", "b", "b", "c", "c", "c", "d"],
"col1": ["1", "2", np.nan, "4", "5", np.nan, "6", "7", np.nan]
})
grp col1
0 a 1
1 a 2
2 a NaN
3 b 4
4 b 5
5 c NaN
6 c 6
7 c 7
8 d NaN
For each group with the column named grp
, I want to drop the rows where col1
is NaN.
The constraint is that I do not want to drop these rows when there's multiple rows within the group.
I would expect the output DataFrame to look like this.
df2 = pd.DataFrame({
"grp": ["a", "a", "b", "b", "c", "c", "d"],
"col1": ["1", "2", "4", "5", "6", "7", np.nan]
})
# notice the NaN in `grp`=="d"
grp col1
0 a 1
1 a 2
2 b 4
3 b 5
4 c 6
5 c 7
6 d NaN
I managed to come up with a solution, but it's clunky. Is there a more succinct way of solving this? I also don't understand why the values were cast to strings...
df1_grp = df1.groupby("grp")['col1'].apply(np.hstack).to_frame().reset_index()
df1_grp['col1'] = df1_grp['col1'].apply(lambda x: [float(_) for _ in x if _!="nan"] if len(x)>1 else x)
df1_grp.explode('col1')
Use GroupBy.transform
with GroupBy.all
for test if all values of group is NaN
and then chain inverted mask by |
by &
for bitwise OR
:
m = df1['col1'].isna()
m1 = m.groupby(df1["grp"]).transform('all')
df = df1[~m | m1]
print (df)
grp col1
0 a 1
1 a 2
3 b 4
4 b 5
6 c 6
7 c 7
8 d NaN
Or you can filter groups with only missing values:
m = df1['col1'].notna()
m1 = df1['grp'].isin(df1.loc[m, 'grp'])
df = df1[m | ~m1]
print (df)
grp col1
0 a 1
1 a 2
3 b 4
4 b 5
6 c 6
7 c 7
8 d NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With