Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex: matching a parenthesis within parenthesis

Tags:

python

regex

I've been trying to match the following string:

string = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))" 

But unfortunately my knowledge of regular expressions is very limited, as you can see there are two parentheses that need to be matched, along with the content inside the second one I tried using re.match("\(w*\)", string) but it didn't work, any help would be greatly appreciated.

like image 278
Paulo Avatar asked Mar 18 '11 20:03

Paulo


People also ask

How do you match parentheses in Python?

One approach to check balanced parentheses is to use stack. Each time, when an open parentheses is encountered push it in the stack, and when closed parenthesis is encountered, match it with the top of stack and pop it. If stack is empty at the end, return Balanced otherwise, Unbalanced.

How do you use parentheses in regex?

Use Parentheses for Grouping and Capturing. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

Do you need to escape parentheses in regex?

This one is kind of how it sounds, we want to literally match parentheses used in a string. Since parentheses are also used for capturing and non-capturing groups, we have to escape the opening parenthesis with a backslash. An explanation of how literalRegex works: / — Opens or begins regex.


1 Answers

Try this:

import re w = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"  # find outer parens outer = re.compile("\((.+)\)") m = outer.search(w) inner_str = m.group(1)  # find inner pairs innerre = re.compile("\('([^']+)', '([^']+)'\)")  results = innerre.findall(inner_str) for x,y in results:     print("%s <-> %s" % (x,y)) 

Output:

index.html <-> home base.html <-> base 

Explanation:

outer matches the first-starting group of parentheses using \( and \); by default search finds the longest match, giving us the outermost ( ) pair. The match m contains exactly what's between those outer parentheses; its content corresponds to the .+ bit of outer.

innerre matches exactly one of your ('a', 'b') pairs, again using \( and \) to match the content parens in your input string, and using two groups inside the ' ' to match the strings inside of those single quotes.

Then, we use findall (rather than search or match) to get all matches for innerre (rather than just one). At this point results is a list of pairs, as demonstrated by the print loop.

Update: To match the whole thing, you could try something like this:

rx = re.compile("^TEMPLATES = \(.+\)") rx.match(w) 
like image 55
phooji Avatar answered Oct 12 '22 08:10

phooji