I have the following pandas dataframe:
df =
A B C
111-ABC 123 EEE
111-ABC 222 EEE
111-ABC 444 XXX
222-CCC 222 YYY
222-CCC 333 67T
333-DDD 123 TTT
333-DDD 123 BTB
333-DDD 444 XXX
333-DDD 555 AAA
I want to delete all groups of rows (grouped by A
) that do not contain 123
in the column B
.
The expected result is this one (the group of rows 222-CCC
is deleted):
result =
A B C
111-ABC 123 EEE
111-ABC 222 EEE
111-ABC 444 XXX
333-DDD 123 TTT
333-DDD 123 BTB
333-DDD 444 AAA
How to do it? I assume that first of all I should use groupby
, but how to filter out the groups of rows, not just particular rows?
result = df.groupby("A").... ??
You can use groupby().filter()
syntax:
df.groupby('A').filter(lambda g: (g.B == 123).any())
using query
a = df.query('B == 123').A.unique()
df.query('A in @a')
A B C
0 111-ABC 123 EEE
1 111-ABC 222 EEE
2 111-ABC 444 XXX
5 333-DDD 123 TTT
6 333-DDD 123 BTB
7 333-DDD 444 XXX
8 333-DDD 555 AAA
You can include additional conditions within the first query
b = df.query('B == 123 & C == "EEE"').A.unique()
df.query('A in @b')
A B C
0 111-ABC 123 EEE
1 111-ABC 222 EEE
2 111-ABC 444 XXX
If speed is important. Try this.
cond1 = df.B.values == 123
a = np.unique(df.A.values[cond1])
df.loc[df.A.isin(a)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With