To help illustrate what I want to achieve here is a DataFrame called df:
column1 column2
1 foo faa
2 bar car
3 dog dog
4 cat rat
5 foo foo
6 bar cat
7 bird rat
8 cat dog
9 bird foo
10 bar car
I want to subset the DataFrame - the condition being that rows are dropped if a string in column2 contains one of multiple values.
This is easy enough for a single value, in this instance 'foo':
df = df[~df['column2'].str.contains("foo")]
But let's say I wanted to drop all rows in which the strings in column2 contained 'cat' or 'foo'. As applied to df above, this would drop 5 rows.
What would be the most efficient, most pythonic way to do this? This could either in the form of a function, multiple booleans or something else I'm not thinking of.
isin doesn't work as it requires exact matches.
N.B: I have edited this question as I made a mistake with it the first time round. Apologies.
Use isin to test for membership of a list of values and negate ~ the boolean mask:
In [3]:
vals = ['bird','cat','foo']
df[~df['column2'].isin(vals)]
Out[3]:
column1 column2
1 2 bar
2 3 dog
5 6 bar
9 10 bar
In [4]:
df['column2'].isin(vals)
Out[4]:
0 True
1 False
2 False
3 True
4 True
5 False
6 True
7 True
8 True
9 False
Name: column2, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With