I have the following pandas dataframe:
df =
A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
222-CCC    222    YYY
222-CCC    333    67T
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    XXX
333-DDD    555    AAA
I want to delete all groups of rows (grouped by A) that do not contain 123 in the column B.
The expected result is this one (the group of rows 222-CCC is deleted):
result =
A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    AAA
How to do it? I assume that first of all I should use groupby, but how to filter out the groups of rows, not just particular rows?
result = df.groupby("A").... ??
                You can use groupby().filter() syntax:
df.groupby('A').filter(lambda g: (g.B == 123).any())

using query
a = df.query('B == 123').A.unique()
df.query('A in @a')
         A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX
5  333-DDD  123  TTT
6  333-DDD  123  BTB
7  333-DDD  444  XXX
8  333-DDD  555  AAA
You can include additional conditions within the first query
b = df.query('B == 123 & C == "EEE"').A.unique()
df.query('A in @b')
          A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX
If speed is important. Try this.
cond1 = df.B.values == 123
a = np.unique(df.A.values[cond1])
df.loc[df.A.isin(a)]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With