I have a dictionary ( e.g. English - Croatian). It may contain sentences and phrases. I'm translating a file of form "english text" = "english text"
into form "english text" = "croatian text"
and using python regex module to do so.
The regex I'm using looks like this (given variable original which is text in English that should be translated:
regexString = '(?<= = ")'+original+'(?=")'
That way I'am able to capture exactly the english text inside the quotes on the right-hand side and substitute it with Croatian. However, the problem appears if the original text contains parenthesis inside. In example:
original = 'This is a wonderland :)'
In that case an error "unbalanced parenthesis" is raised. If original would be hard-coded, I could solve the problem by putting
original = 'This is a wonderland :\\)'
However, there is a whole file full of *original * variables.
Is there any solution to this problem other than changing original variable by preceeding all parenthesis in it with a backslash?
Use Parentheses for Grouping and Capturing. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.
The way we solve this problem—i.e., the way we match a literal open parenthesis '(' or close parenthesis ')' using a regular expression—is to put backslash-open parenthesis '\(' or backslash-close parenthesis '\)' in the RE. This is another example of an escape sequence.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
One approach to check balanced parentheses is to use stack. Each time, when an open parentheses is encountered push it in the stack, and when closed parenthesis is encountered, match it with the top of stack and pop it. If stack is empty at the end, return Balanced otherwise, Unbalanced.
You can use re.escape
to handle this:
regexString = '(?<= = ")' + re.escape(original) + '(?=")'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With