remove NaNs if only value already exists in corresponding Ids pandas

Question

I have this dataframe

 Id,ProductId,Product
  1,100,a
  1,100,x
  1,100,NaN
  2,150,NaN
  3,150,NaN
  4,100,a
  4,100,x
  4,100,NaN

Here I want to remove some of the rows which contains NaN and some I don't want to remove. The removing criteria is as follow. I want to remove only those NaNs rows whose Id already contains the value in Product columns. for example, here Id1 has already value in Product columns and still contains NaN, so I want to remove that row. But for id2, there exists only NaN in Product column. So I don't want to remove that one. Similarly for Id3 also, there is only NaN values in the Product columns and I want to keep it that one too.

Final Output would be like this

Id,ProductId,Product
1,100,a
1,100,x
2,150,NaN
3,150,NaN
4,100,a
4,100,x

jezrael · Accepted Answer

Dont use groupby if exist alternative, because slow.

vals = df.loc[df['Product'].notnull(), 'Id'].unique()
df = df[~(df['Id'].isin(vals) & df['Product'].isnull())]
print (df)
   Id  ProductId Product
0   1        100       a
1   1        100       x
3   2        150     NaN
4   3        150     NaN
5   4        100       a
6   4        100       x

Explanation:

First get all Id with some non missing values:

print (df.loc[df['Product'].notnull(), 'Id'].unique())
[1 4]

Then check these groups with missing values:

print (df['Id'].isin(vals) & df['Product'].isnull())
0    False
1    False
2     True
3    False
4    False
5    False
6    False
7     True
dtype: bool

Invert boolean mask:

print (~(df['Id'].isin(vals) & df['Product'].isnull()))
0     True
1     True
2    False
3     True
4     True
5     True
6     True
7    False
dtype: bool

And last filter by boolean indexing:

print (df[~(df['Id'].isin(vals) & df['Product'].isnull())])
   Id  ProductId Product
0   1        100       a
1   1        100       x
3   2        150     NaN
4   3        150     NaN
5   4        100       a
6   4        100       x

Mabel Villalba · Answer

You can group the dataframe by Id and drop the NaN if the group has more than one element:

>> df.groupby(level='Id', group_keys=False
             ).apply(lambda x: x.dropna() if len(x) > 1 else x)

    ProductId Product
Id                   
1         100       a
1         100       x
2         150     NaN
3         150     NaN
4         100       a
4         100       x

remove NaNs if only value already exists in corresponding Ids pandas

Tags:

python

pandas

Ben

2 Answers

jezrael

Mabel Villalba

Recent Activity

Donate For Us

remove NaNs if only value already exists in corresponding Ids pandas

Tags:

python

pandas

Ben

2 Answers

jezrael

Mabel Villalba

Related questions

Recent Activity

Donate For Us