My question is kind of an extension of the question answered quite well in this link:
I've posted the answer here below where the strings are filtered out when they contain the word "ball":
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4
Now my question is: what if I have long sentences in my data, and I want to identify strings with the words "ball" AND "field"? So that it throws away data that contains the word "ball" or "field" when only one of them occur, but keeps the ones where the string has both words in it.
Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
filter() method is a very useful method of Python. One or more data values can be filtered from any string or list or dictionary in Python by using filter() method. It filters data based on any particular condition. It stores data when the condition returns true and discard data when returns false.
df[df['ids'].str.contains("ball")]
Would become:
df[df['ids'].str.contains("ball") & df['ids'].str.contains("field")]
If you are into neater code:
contains_balls = df['ids'].str.contains("ball")
contains_fields = df['ids'].str.contains("field")
filtered_df = df[contains_balls & contains_fields]
If you have more than 2 , You can using this ..(Notice the speed is not as good as foxyblue's method )
l = ['ball', 'field']
df.ids.apply(lambda x: all(y in x for y in l))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With