I have a long list of words and regular expression patterns in a .txt file, which I read in like this:
with open(fileName, "r") as f1:
pattern_list = f1.read().split('\n')
for illustration, the first seven look like this:
print pattern_list[:7]
# ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']
I want to know whenever I match a word from an input string to any of the words/patterns in pattern_list. The below sort of works, but I see two problems:
if w in regex_compile_list:
, it didn't work right.)What am I doing wrong, and how can I be more efficient? Thanks in advance for your patience with a noob, and thanks for any insight!
string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
for raw_str in pattern_list:
pat = re.compile(raw_str)
for w in string_input.split():
if pat.match(w):
print "matched:", raw_str, "with:", w
#matched: abandon* with: abandoned
#matched: abandon* with: abandon
#matched: abuse* with: abused
#matched: abuse* with: abusive,
#matched: abuse* with: abuse
#matched: abusi* with: abused
#matched: abusi* with: abusive,
#matched: abusi* with: abuse
#matched: ache* with: aching
#matched: aching with: aching
#matched: advers* with: adversarial,
#matched: afraid with: afraid
#matched: aggress* with: aggressive
#matched: aggress* with: aggression.
Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.
For matching shell-style wildcards you could (ab)use the module fnmatch
As fnmatch
is primary designed for filename comparaison, the test will be case sensitive or not depending your operating system. So you'll have to normalize both the text and the pattern (here, I use lower()
for that purpose)
>>> import fnmatch
>>> pattern_list = ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']
>>> string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
>>> for pattern in pattern_list:
... l = fnmatch.filter(string_input.split(), pattern)
... if l:
... print pattern, "match", l
Producing:
abandon* match ['abandoned', 'abandon']
abuse* match ['abused', 'abuse']
abusi* match ['abusive,']
aching match ['aching']
advers* match ['adversarial,']
afraid match ['afraid']
aggress* match ['aggressive', 'aggression.']
abandon*
will match abandonnnnnnnnnnnnnnnnnnnnnnn
, and not abandonasfdsafdasf
. You want
abandon.*
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With