Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas filter by more than one "contains" for not one cell but entire column

I have a bunch of dataframes, and I want to find the dataframes that contains both the words i specify. For example, I want to find all dataframes that contain the words hello and world. A & B would qualify, C would not.

I've tried: df[(df[column].str.contains('hello')) & (df[column].str.contains('world'))] which only picks up B and df[(df[column].str.contains('hello')) | (df[column].str.contains('world'))] which picks up all three.

I need something that picks only A & B

A=

    Name    Data   
0   Mike    hello    
1   Mike    world    
2   Mike    hello   
3   Fred    world
4   Fred    hello
5   Ted     world

B =

    Name    Data   
0   Mike    helloworld
1   Mike    world    
2   Mike    hello   
3   Fred    world
4   Fred    hello
5   Ted     world

C=

    Name    Data   
0   Mike    hello
1   Mike    hello    
2   Mike    hello   
3   Fred    hello
4   Fred    hello
5   Ted     hello
like image 771
jason Avatar asked Jan 26 '23 20:01

jason


1 Answers

You want a single bool value for if 'hello' is found anywhere and 'world' is found anywhere in one column:

df.Data.str.contains('hello').any() & df.Data.str.contains('world').any()

If you have a list of words and need to check over the entire DataFrame try:

import numpy as np

lst = ['hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])

Sample Data

print(df)
   Name   Data   Data2
0  Mike  hello  orange
1  Mike  world  banana
2  Mike  hello  banana
3  Fred  world  apples
4  Fred  hello   mango
5   Ted  world    pear

lst = ['apple', 'hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
#True

lst = ['apple', 'hello', 'world', 'bear']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
# False
like image 116
ALollz Avatar answered Jan 31 '23 07:01

ALollz