Which is the fastest way to search if a string contains another string based on a list?
This one works fine, but is too slow for me when the string is large and the list is long.
test_string = "Hello! This is a test. I love to eat apples."
fruits = ['apples', 'oranges', 'bananas']
for fruit in fruits:
if fruit in test_string:
print(fruit+" contains in the string")
The easiest way to check if a Python string contains a substring is to use the in operator. The in operator is used to check data structures for membership in Python. It returns a Boolean (either True or False ).
ArrayList contains() method in Java is used for checking if the specified element exists in the given list or not. Returns: It returns true if the specified element is found in the list else it returns false.
For this I'd suggest firstly tokenize the string with RegexpTokenizer
to remove all special characters and then use sets
to find the intersection:
from nltk.tokenize import RegexpTokenizer
test_string = "Hello! This is a test. I love to eat apples."
tokenizer = RegexpTokenizer(r'\w+')
test_set = set(tokenizer.tokenize(test_string))
# {'Hello', 'I', 'This', 'a', 'apples', 'eat', 'is', 'love', 'test', 'to'}
Having tokenized the string and constructed a set find the set.intersection
:
set(['apples', 'oranges', 'bananas']) & test_set
# {'apples'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With