I need a regular expression that can find any unmatched brace (opening or closing) in a string potentially containing matching parentheses.
The question exists here on stackoverflow, but I haven't found a regex-based solution that works.
I came up with a regex that finds unmatched open braces \((?![^)]+\))
using a negative lookahead, but I can't seem to figure out the opposite one required for unmatched closing braces.
EDIT: The above regex to find unmatched open braces doesn't work as intended. E.g. it will miss cases where multiple open braces are followed by a single closing brace (see also comments)
Here is my test string that I've been experimenting with on Rubular:
one) ((two) (three) four) (five)))
Note that the string can contain any type of character, including quotes, dashes, etc.
The short answer is that you can't find unmatched parentheses with regular expressions. Regular expressions encode regular languages, whereas the language of all properly matched parentheses is a context-free language.
Here's a sort-of-regex-based solution :)
def balanced?( str, open='(', close=')' )
re = Regexp.new( "[\\#{open}\\#{close}]" )
str.scan(re).inject(0) do |lv,c|
break :overclosed if lv < 0
lv + (c==open ? 1 : -1)
end == 0
end
s1 = "one) ((two) (three) four) (five)))"
s2 = "((one) ((two) (three) four) (five))"
s3 = "((one) ((two) (three) four) (five)"
puts balanced?(s1), #=> false
balanced?(s2), #=> true
balanced?(s3) #=> false
Ruby's Oniguruma library can parse LALR(n) grammars, including HTML. Citing the README:
r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED)
(?<element> \g<stag> \g<content>* \g<etag> ){0}
(?<stag> < \g<name> \s* > ){0}
(?<name> [a-zA-Z_:]+ ){0}
(?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
(?<etag> </ \k<name+1> >){0}
\g<element>
__REGEXP__
p r.match('<foo>f<bar>bbb</bar>f</foo>').captures
The above code is of course much simpler than a real HTML parser, but it matches nested tags. Also, you should note that it is incredibly simple to make a regex which would be very slow (in the range of minutes to parse a 80-symbol string).
It's better to use a real parser like Treetop for this task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With