Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to match nested parentheses with regex?

Tags:

I'm trying to match a mathematical-expression-like string, that have nested parentheses.

import re

p = re.compile('\(.+\)')
str = '(((1+0)+1)+1)'
print p.findall(s)

['(((1+0)+1)+1)']

I wanted it to match all the enclosed expressions, such as (1+0), ((1+0)+1)...
I don't even care if it matches unwanted ones like (((1+0), I can take care of those.

Why it's not doing that already, and how can I do it?

like image 720
Cinco Avatar asked Mar 28 '11 03:03

Cinco


People also ask

How do you match a literal parenthesis in a regular expression?

The way we solve this problem—i.e., the way we match a literal open parenthesis '(' or close parenthesis ')' using a regular expression—is to put backslash-open parenthesis '\(' or backslash-close parenthesis '\)' in the RE. This is another example of an escape sequence.

Can you use parentheses in regex?

Use Parentheses for Grouping and Capturing. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

Can a regex identify correct bracketing?

Regular expressions can't count brackets.


1 Answers

As others have mentioned, regular expressions are not the way to go for nested constructs. I'll give a basic example using pyparsing:

import pyparsing # make sure you have this installed

thecontent = pyparsing.Word(pyparsing.alphanums) | '+' | '-'
parens     = pyparsing.nestedExpr( '(', ')', content=thecontent)

Here's a usage example:

>>> parens.parseString("((a + b) + c)")

Output:

(                          # all of str
 [
  (                        # ((a + b) + c)
   [
    (                      #  (a + b)
     ['a', '+', 'b'], {}   
    ),                     #  (a + b)      [closed]
    '+',
    'c'
   ], {}
  )                        # ((a + b) + c) [closed]
 ], {}  
)                          # all of str    [closed]

(With newlining/indenting/comments done manually)

Edit: Modified to eliminate unnecessary Forward, as per Paul McGuire's suggestions.

To get the output in nested list format:

res = parens.parseString("((12 + 2) + 3)")
res.asList()

Output:

[[['12', '+', '2'], '+', '3']]
like image 179
phooji Avatar answered Sep 18 '22 15:09

phooji