Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Flatten column based on condition?

Tags:

python

pandas

I am trying to flatten rows and keep the info from the rows I want.

What I have:

id  var1  var2 var3
1      Y     N    Y
1      N          Y
2      Y          N
2      N     Y    N
2      Y     N    Y

What I would like:

id  var1  var2 var3
1      Y     N    Y
2      Y     Y    Y

Essentially, it would check if there is a Y/N and always give priority to a Y. Also there are more columns than var1, var2, var3; so I would like something more general so I could apply to other columns as well.

like image 963
spitfiredd Avatar asked Dec 23 '22 18:12

spitfiredd


1 Answers

Let's try, you can use groupby and sum to act like an OR, hence "giving Y priority":

df1 = df.replace({'Y':True,'N':False})

df_out = (df1.groupby('id').sum(skipna=False)
         .astype(bool)
         .replace({True:'Y',False:'N'})
         .reset_index())

print(df_out)

Output:

   id var1 var2 var3
0   1    Y    N    Y
1   2    Y    Y    Y
like image 76
Scott Boston Avatar answered Jan 02 '23 11:01

Scott Boston