I've been trying to match the following string:
string = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"
But unfortunately my knowledge of regular expressions is very limited, as you can see there are two parentheses that need to be matched, along with the content inside the second one I tried using re.match("\(w*\)", string)
but it didn't work, any help would be greatly appreciated.
One approach to check balanced parentheses is to use stack. Each time, when an open parentheses is encountered push it in the stack, and when closed parenthesis is encountered, match it with the top of stack and pop it. If stack is empty at the end, return Balanced otherwise, Unbalanced.
Use Parentheses for Grouping and Capturing. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.
$ means "Match the end of the string" (the position after the last character in the string).
This one is kind of how it sounds, we want to literally match parentheses used in a string. Since parentheses are also used for capturing and non-capturing groups, we have to escape the opening parenthesis with a backslash. An explanation of how literalRegex works: / — Opens or begins regex.
Try this:
import re w = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))" # find outer parens outer = re.compile("\((.+)\)") m = outer.search(w) inner_str = m.group(1) # find inner pairs innerre = re.compile("\('([^']+)', '([^']+)'\)") results = innerre.findall(inner_str) for x,y in results: print("%s <-> %s" % (x,y))
Output:
index.html <-> home base.html <-> base
Explanation:
outer
matches the first-starting group of parentheses using \(
and \)
; by default search
finds the longest match, giving us the outermost ( )
pair. The match m
contains exactly what's between those outer parentheses; its content corresponds to the .+
bit of outer
.
innerre
matches exactly one of your ('a', 'b')
pairs, again using \(
and \)
to match the content parens in your input string, and using two groups inside the ' '
to match the strings inside of those single quotes.
Then, we use findall
(rather than search
or match
) to get all matches for innerre
(rather than just one). At this point results
is a list of pairs, as demonstrated by the print loop.
Update: To match the whole thing, you could try something like this:
rx = re.compile("^TEMPLATES = \(.+\)") rx.match(w)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With