Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Literal parenthesis with python regex

I have a dictionary ( e.g. English - Croatian). It may contain sentences and phrases. I'm translating a file of form "english text" = "english text" into form "english text" = "croatian text" and using python regex module to do so. The regex I'm using looks like this (given variable original which is text in English that should be translated:

regexString = '(?<= = ")'+original+'(?=")'

That way I'am able to capture exactly the english text inside the quotes on the right-hand side and substitute it with Croatian. However, the problem appears if the original text contains parenthesis inside. In example:

original = 'This is a wonderland :)'

In that case an error "unbalanced parenthesis" is raised. If original would be hard-coded, I could solve the problem by putting

original = 'This is a wonderland :\\)'

However, there is a whole file full of *original * variables.
Is there any solution to this problem other than changing original variable by preceeding all parenthesis in it with a backslash?

like image 734
kruk Avatar asked May 22 '14 11:05

kruk


People also ask

What are parentheses in Python regex?

Use Parentheses for Grouping and Capturing. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.

How do you match a literal parenthesis in a regular expression?

The way we solve this problem—i.e., the way we match a literal open parenthesis '(' or close parenthesis ')' using a regular expression—is to put backslash-open parenthesis '\(' or backslash-close parenthesis '\)' in the RE. This is another example of an escape sequence.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

How do you match parentheses in Python?

One approach to check balanced parentheses is to use stack. Each time, when an open parentheses is encountered push it in the stack, and when closed parenthesis is encountered, match it with the top of stack and pop it. If stack is empty at the end, return Balanced otherwise, Unbalanced.


1 Answers

You can use re.escape to handle this:

regexString = '(?<= = ")' + re.escape(original) + '(?=")'
like image 76
Zero Piraeus Avatar answered Sep 29 '22 18:09

Zero Piraeus