Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas conditional groupby

I have a dataframe like below:

df = pd.DataFrame({'col_1': [6ai,6aii,6aii,6b],
               'col_2': [1,1,5,1],
               'col_3':[True,False,True,False]})

   col_1  col_2 col_3
0    6a1      1    True
1    6aii     1    False
2    6aii     5    True
3    6b       1    False

I want to group this dataframe on col_1, and then only select the row where col_3 is True. In cases where I have only one occurrence of a value in col_1, I want to select the row regardless col_3 is True or False. So the result I'm after is:

   col_1  col_2 col_3
0    6a1      1    True
2    6aii     5    True
3    6b       1    False

I'm thinking that I should use groupby, but I'm not sure. I could really use some help please?

like image 218
Robin Kuijs Avatar asked May 18 '26 01:05

Robin Kuijs


2 Answers

Here is one way

df[df.col_3|~df.col_1.duplicated(keep=False)]
Out[344]: 
  col_1  col_2  col_3
0   6a1      1   True
2  6aii      5   True
3    6b      1  False
like image 127
BENY Avatar answered May 20 '26 13:05

BENY


You can use groupby().transform('count') to find those occur exactly once:

df[df['col_3'] | df.groupby('col_1')['col_3'].transform('count').eq(1)]

Output:

  col_1  col_2  col_3
0   6ai      1   True
2  6aii      5   True
3    6b      1  False
like image 31
Quang Hoang Avatar answered May 20 '26 14:05

Quang Hoang