I am not too familiar with RE but I am trying to iterate over a list and use re.sub
to take out multiple items from a large block of text that is held in the variable first_word
.
I use re.sub
to remove tags first and this works fine, but I next want to remove all the strings in the exclusionList
variable and I am not sure how to do this.
Thanks for the help, here is the code that raises the exception.
exclusionList = ['+','of','<ET>f.','to','the','<L>L.</L>']
for a in range(0, len(exclusionList)):
first_word = re.sub(exclusionList[a], '',first_word)
And the exception :
first_word = re.sub(exclusionList[a], '',first_word)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 245, in _compile
raise error, v # invalid expression error: nothing to repeat
The plus symbol is an operator in regex meaning 'one or more repetitions of the preceding'. E.g., x+
means one or more repetitions of x
. If you want to find and replace actual +
signs, you need to escape it like this: re.sub('\+', '', string)
. So change the first entry in your exclusionList.
You can also eliminate the for loop, like this:
exclusions = '|'.join(exclusionList)
first_word = re.sub(exclusions, '', first_word)
The pipe symbol |
indicates a disjunction in regex, so x|y|z
matches x or y or z.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With