I need to search a string for multiple words.
import re
words = [{'word':'test1', 'case':False}, {'word':'test2', 'case':False}]
status = "test1 test2"
for w in words:
if w['case']:
r = re.compile("\s#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile("\s#?%s" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
For some reason, this will only ever find "test2" and never "test1". Why is this?
I know I can use | delimitated searches but there could be hundreds of words which is why I am using a for loop.
Use | (pipe) operator to specify multiple patterns.
findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.
re.search() is returning match object and implies that first match found at index 69. re. match() is returning none because match exists in the second line of the string and re. match() only works if the match is found at the beginning of the string.
According to Python docs, re.finditer(pattern, string, flags=0) Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.
There is no space before test1
in status
, while your generated regular expressions require there to be a space.
You can modify the test to match either after a space or at the beginning of a line:
for w in words:
if w['case']:
r = re.compile("(^|\s)#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile("(^|\s)#?%s" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
As Martijn pointed out, there's no space before test1
. But also your code doesn't properly handle the case when a word is longer. Your code would find test2blabla
as an instance of test2
, and I'm not sure if that is what you want.
I suggest using word boundary regex \b
:
for w in words:
if w['case']:
r = re.compile(r"\b%s\b" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile(r"\b%s\b" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
EDIT:
I should've pointed out that if you really want to allow only (whitespace)word
or (whitespace)#word
format, you cannot use \b
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With