Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check if a string contains a string from a list

Tags:

python

list

Which is the fastest way to search if a string contains another string based on a list?

This one works fine, but is too slow for me when the string is large and the list is long.

test_string = "Hello! This is a test. I love to eat apples."

fruits = ['apples', 'oranges', 'bananas'] 

for fruit in fruits:
    if fruit in test_string:
        print(fruit+" contains in the string")
like image 524
Kristoffer Avatar asked Oct 04 '19 14:10

Kristoffer


People also ask

How do you check if a string contains a substring from a list Python?

The easiest way to check if a Python string contains a substring is to use the in operator. The in operator is used to check data structures for membership in Python. It returns a Boolean (either True or False ).

How do you check if a string contains in a list Java?

ArrayList contains() method in Java is used for checking if the specified element exists in the given list or not. Returns: It returns true if the specified element is found in the list else it returns false.


1 Answers

For this I'd suggest firstly tokenize the string with RegexpTokenizer to remove all special characters and then use sets to find the intersection:

from nltk.tokenize import RegexpTokenizer
test_string = "Hello! This is a test. I love to eat apples."

tokenizer = RegexpTokenizer(r'\w+')
test_set = set(tokenizer.tokenize(test_string))
# {'Hello', 'I', 'This', 'a', 'apples', 'eat', 'is', 'love', 'test', 'to'}

Having tokenized the string and constructed a set find the set.intersection:

set(['apples', 'oranges', 'bananas']) & test_set
# {'apples'}
like image 91
yatu Avatar answered Oct 04 '22 07:10

yatu