I'm currently working through a book on regular expressions, and one of the practice problems is to write a regular expression that match Strings that have the same number of a's as z's and b's as y's. I came up with the following regular expression so far.
^(?=[^az]*([az][^az]*[az][^az]*)*$)(?=[^by]*([by][^by]*[by][^by]*)*$).*$
The issue with this is that it incorrectly matches when a's and z's are even and b's and y's are even (i.e. azzz would match, but has more z's than a's). Is there a way to modify my regular expression to match correctly or am I pursuing the wrong approach?
With some regex engines, you can use pre-defined subroutines to (clumsily) define context-free grammars, though the syntax varies from engine to engine and isn't standardized. Observe (still incomplete, but getting there):
(?(DEFINE)
(?'all'(?&az)|(?&by)|(?&abzy)|(?&bayz))
(?'az'a(?&all)*z|z(?&all)*a)
(?'by'b(?&all)*y|y(?&all)*b)
(?'abzy'
a(?&all)*b(?&all)*z(?&all)*y|
a(?&all)*y(?&all)*z(?&all)*b|
z(?&all)*b(?&all)*a(?&all)*y|
z(?&all)*y(?&all)*a(?&all)*b
)
(?'bayz'
b(?&all)*a(?&all)*y(?&all)*z|
b(?&all)*z(?&all)*y(?&all)*a|
y(?&all)*a(?&all)*b(?&all)*z|
y(?&all)*z(?&all)*b(?&all)*a
)
)
^(?&all)+$
Demo on Regex101
What this does is define a set of subpatterns and apply them recursively. Using the ^
and $
anchors in the actual "pattern" makes sure that the entire string matches them. Simplicity itself.
Though, if you actually do something like this in a production environment, someone is liable to come shoot you when they find it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With