I pretty new to python, so i have a dictionary with some keys in it, and a string. I have to replace the string if a pattern found in the dictionary exists in the string. both the dictionary and string are very large. I'm using a regex to find the patterns.
It all works fine until a key like this pops up '-(' or this '(-)' in which case python gives an error for unbalanced parenthesis.
Here's how the code I've written looks:
somedict={'-(':'value1','(-)':'value2'}
somedata='this is some data containing -( and (-)'
for key in somedict.iterkeys():
somedata=re.sub(key, 'newvalue', somedata)
Here's the error I've got in the console
Traceback (most recent call last):
File "<console>", line 2, in <module>
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Python27\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis
I've also tried it many ways using the regex compiler and searched a lot but didn't find anything addressing the problem. Any help is appreciated.
You need to escape the key using re.escape()
:
somedata = re.sub(re.escape(key), 'newvalue', somedata)
otherwise the contents will be interpreted as a regular expression.
You are not using regular expressions at all here, so you may as well just use:
somedata = somedata.replace(key, 'newvalue')
If you wanted to replace only whole words (so with whitespace or punctuation markes around them, at the start or end of the input string), you need to some kind of boundary anchors, at which point it makes sense to use regular expressions. If all you have are alphanumeric words (plus underscores), \b
would work:
somedata = re.sub(r'\b{}\b'.format(re.escape(key)), 'newvalue', somedata)
This puts \b
before and after the string you wanted to replace, so that baz
in foo baz bar
is changed, but foo bazbaz bar
is not.
For input that involves non-alphanumeric 'words', you'd need to match whitespace-or-start and whitespace-or-end anchors with look-aheads and look-behinds:
somedata = re.sub(r'(?:^|(?<=\s)){}(?:$|(?=\s))'.format(re.escape(key)), 'newvalue', somedata)
Here the pattern (?:^|(?<=\s))
uses two anchors, the start-of-string anchor and a look-behind assertion, to match the places where there is either the start of the string or a space immediately to the left. Similarly (?:$|(?=\s)
does the same for the other end, matching the end of the string or a position followed by a space.
Don't use re
for something so simple — just replace:
somedata = somedata.replace(key, 'newvalue')
That said, if you're constructing a regexp from something, use re.escape
to escape special characters:
somedata=re.sub(re.escape(key), 'newvalue', somedata)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With