Fastest way to check if a string contains a string from a list

Tags:

python

list

Which is the fastest way to search if a string contains another string based on a list?

This one works fine, but is too slow for me when the string is large and the list is long.

test_string = "Hello! This is a test. I love to eat apples."

fruits = ['apples', 'oranges', 'bananas'] 

for fruit in fruits:
    if fruit in test_string:
        print(fruit+" contains in the string")

524

asked Oct 04 '19 14:10

Kristoffer

1 Answers

For this I'd suggest firstly tokenize the string with RegexpTokenizer to remove all special characters and then use sets to find the intersection:

from nltk.tokenize import RegexpTokenizer
test_string = "Hello! This is a test. I love to eat apples."

tokenizer = RegexpTokenizer(r'\w+')
test_set = set(tokenizer.tokenize(test_string))
# {'Hello', 'I', 'This', 'a', 'apples', 'eat', 'is', 'love', 'test', 'to'}

Having tokenized the string and constructed a set find the set.intersection:

set(['apples', 'oranges', 'bananas']) & test_set
# {'apples'}

answered Oct 04 '22 07:10

yatu

Related questions
                            
                                How to execute file.py on HTML button press using Django?
                            
                                sort Persian strings for python [duplicate]
                            
                                convert Dataframe to 2d Array
                            
                                More efficient method of finding minimum sum after k operations
                            
                                How To Call Postgres 11 Stored Procedure From Python
                            
                                Could not find a version that satisfies the requirement flask (from versions: ) No matching distribution found for flask
                            
                                Sum only numeric columns in pandas
                            
                                What is the process "python3 unattended upgrade shutdown"?
                            
                                Storing OAuth Token in Python Library
                            
                                Is it possible to sort a list with reduce?
                            
                                `try ... except not` construction
                            
                                COCO api evaluation for subset of classes
                            
                                Sum column based on another column in Pandas DataFrame
                            
                                compute maximum f1 score using precision_recall_curve?
                            
                                AttributeError when using callback Tensorboard on Keras: 'Model' object has no attribute 'run_eagerly'
                            
                                Find index of the first and/or last value in a column that is not NaN
                            
                                Setup IR Remote Control Using LIRC for the Raspberry PI (RPi)
                            
                                List of LISTS of tuples to Pandas dataframe?
                            
                                Expand rows by date range having start and end in Pandas
                            
                                Google Sheets API appending extra apostrophe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With