I have a bunch of dataframes, and I want to find the dataframes that contains both the words i specify. For example, I want to find all dataframes that contain the words hello
and world
. A & B would qualify, C would not.
I've tried:
df[(df[column].str.contains('hello')) & (df[column].str.contains('world'))]
which only picks up B and df[(df[column].str.contains('hello')) | (df[column].str.contains('world'))]
which picks up all three.
I need something that picks only A & B
A=
Name Data
0 Mike hello
1 Mike world
2 Mike hello
3 Fred world
4 Fred hello
5 Ted world
B =
Name Data
0 Mike helloworld
1 Mike world
2 Mike hello
3 Fred world
4 Fred hello
5 Ted world
C=
Name Data
0 Mike hello
1 Mike hello
2 Mike hello
3 Fred hello
4 Fred hello
5 Ted hello
You want a single bool value for if 'hello'
is found anywhere and 'world'
is found anywhere in one column:
df.Data.str.contains('hello').any() & df.Data.str.contains('world').any()
If you have a list of words and need to check over the entire DataFrame
try:
import numpy as np
lst = ['hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
print(df)
Name Data Data2
0 Mike hello orange
1 Mike world banana
2 Mike hello banana
3 Fred world apples
4 Fred hello mango
5 Ted world pear
lst = ['apple', 'hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
#True
lst = ['apple', 'hello', 'world', 'bear']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
# False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With