Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex error: unbalanced parenthesis

Tags:

python

regex

I pretty new to python, so i have a dictionary with some keys in it, and a string. I have to replace the string if a pattern found in the dictionary exists in the string. both the dictionary and string are very large. I'm using a regex to find the patterns.

It all works fine until a key like this pops up '-(' or this '(-)' in which case python gives an error for unbalanced parenthesis.

Here's how the code I've written looks:

somedict={'-(':'value1','(-)':'value2'}
somedata='this is some data containing -( and (-)'
for key in somedict.iterkeys():
    somedata=re.sub(key, 'newvalue', somedata)

Here's the error I've got in the console

Traceback (most recent call last):
  File "<console>", line 2, in <module>
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 244, in _compile
    raise error, v # invalid expression
error: unbalanced parenthesis

I've also tried it many ways using the regex compiler and searched a lot but didn't find anything addressing the problem. Any help is appreciated.

like image 595
Josyula Krishna Avatar asked Apr 11 '13 11:04

Josyula Krishna


2 Answers

You need to escape the key using re.escape():

somedata = re.sub(re.escape(key), 'newvalue', somedata)

otherwise the contents will be interpreted as a regular expression.

You are not using regular expressions at all here, so you may as well just use:

somedata = somedata.replace(key, 'newvalue')

If you wanted to replace only whole words (so with whitespace or punctuation markes around them, at the start or end of the input string), you need to some kind of boundary anchors, at which point it makes sense to use regular expressions. If all you have are alphanumeric words (plus underscores), \b would work:

somedata = re.sub(r'\b{}\b'.format(re.escape(key)), 'newvalue', somedata)

This puts \b before and after the string you wanted to replace, so that baz in foo baz bar is changed, but foo bazbaz bar is not.

For input that involves non-alphanumeric 'words', you'd need to match whitespace-or-start and whitespace-or-end anchors with look-aheads and look-behinds:

somedata = re.sub(r'(?:^|(?<=\s)){}(?:$|(?=\s))'.format(re.escape(key)), 'newvalue', somedata)

Here the pattern (?:^|(?<=\s)) uses two anchors, the start-of-string anchor and a look-behind assertion, to match the places where there is either the start of the string or a space immediately to the left. Similarly (?:$|(?=\s) does the same for the other end, matching the end of the string or a position followed by a space.

like image 152
Martijn Pieters Avatar answered Nov 14 '22 04:11

Martijn Pieters


Don't use re for something so simple — just replace:

somedata = somedata.replace(key, 'newvalue')

That said, if you're constructing a regexp from something, use re.escape to escape special characters:

somedata=re.sub(re.escape(key), 'newvalue', somedata)
like image 2
Pavel Anossov Avatar answered Nov 14 '22 04:11

Pavel Anossov