I am trying to flatten rows and keep the info from the rows I want.
What I have:
id var1 var2 var3
1 Y N Y
1 N Y
2 Y N
2 N Y N
2 Y N Y
What I would like:
id var1 var2 var3
1 Y N Y
2 Y Y Y
Essentially, it would check if there is a Y/N and always give priority to a Y. Also there are more columns than var1, var2, var3; so I would like something more general so I could apply to other columns as well.
Let's try, you can use groupby
and sum
to act like an OR, hence "giving Y priority":
df1 = df.replace({'Y':True,'N':False})
df_out = (df1.groupby('id').sum(skipna=False)
.astype(bool)
.replace({True:'Y',False:'N'})
.reset_index())
print(df_out)
Output:
id var1 var2 var3
0 1 Y N Y
1 2 Y Y Y
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With