Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete groups of rows based on condition

Tags:

python

pandas

I have the following pandas dataframe:

df =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
222-CCC    222    YYY
222-CCC    333    67T
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    XXX
333-DDD    555    AAA

I want to delete all groups of rows (grouped by A) that do not contain 123 in the column B.

The expected result is this one (the group of rows 222-CCC is deleted):

result =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    AAA

How to do it? I assume that first of all I should use groupby, but how to filter out the groups of rows, not just particular rows?

result = df.groupby("A").... ??
like image 872
Dinosaurius Avatar asked Mar 11 '23 08:03

Dinosaurius


2 Answers

You can use groupby().filter() syntax:

df.groupby('A').filter(lambda g: (g.B == 123).any())

enter image description here

like image 96
Psidom Avatar answered Mar 15 '23 21:03

Psidom


using query

a = df.query('B == 123').A.unique()
df.query('A in @a')

         A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX
5  333-DDD  123  TTT
6  333-DDD  123  BTB
7  333-DDD  444  XXX
8  333-DDD  555  AAA

You can include additional conditions within the first query

b = df.query('B == 123 & C == "EEE"').A.unique()
df.query('A in @b')

          A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX

If speed is important. Try this.

cond1 = df.B.values == 123
a = np.unique(df.A.values[cond1])
df.loc[df.A.isin(a)]
like image 30
piRSquared Avatar answered Mar 15 '23 21:03

piRSquared