Possible Duplicate:
Check if multiple strings exist in another string
I am trying to find out if there is a nice and clean way to test for 3 different strings.
Basically I am looping trough a file using a for
loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.
So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:
for line in file
if "string1" in line or "string2" in line or "string3" in line:
print "found the string"
I was thinking like creating a list that contains string1
, string2
and string3
, and check if any of these is contained in the line, but it doesn't seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.
Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?
Using String. contains() method for each substring. You can terminate the loop on the first match of the substring, or create a utility function that returns true if the specified string contains any of the substrings from the specified list.
Using Python's "in" operator The simplest and fastest way to check whether a string contains a substring or not in Python is the "in" operator . This operator returns true if the string contains the characters, otherwise, it returns false .
The in operator in Python is basically used to check for data structure membership. It returns either False or True. In Python, we may use the in operator on the superstring to see if a string has a substring. This operator is the best way for using the __contains__ method on an object.
Using regular expressions, we can easily check multiple substrings in a single-line statement. We use the findall() method of the re module to get all the matches as a list of strings and pass it to any() method to get the result in True or False.
strings = ("string1", "string2", "string3")
for line in file:
if any(s in line for s in strings):
print "yay!"
This still loops through the cartesian product of the two lists, but it does it one line:
>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
True
>>> any(s in l for l in lines2 for s in search_strings)
False
This also have the advantage that any
short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings
in linesX
. If you want to find multiple occurrences you could do something like this:
>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]
If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you'll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.
One approach is to combine the search strings into a regex pattern as in this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With