Given several regular expressions, can we write a regular expressions which is equal to their intersection?
For example, given two regular expressions c[a-z][a-z]
and [a-z][aeiou]t
, their intersection contains cat
and cut
and possibly more. How can we write a regular expression for their intersection?
Thanks.
Intersection is not a part of standard regular expressions.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Regular Languages are closed under intersection, i.e., if L1 and L2 are regular then L1 ∩ L2 is also regular.
First, let's agree on terms. My syntactical assumption will be that
The intersection of several regexes is one regex that matches strings that each of the component regexes also match.
The General Option
To check for the intersection of two patterns, the general method is (pseudo-code):
if match(regex1) && match(regex2) { champagne for everyone! }
The Regex Option
In some cases, you can do the same with lookaheads, but for a complex regex there is little benefit of doing so, apart from making your regex more obscure to your enemies. Why little benefit? Because the engine will have to parse the whole string multiple times anyway.
Boolean AND
The general pattern for an AND checking that a string exactly meets regex1 and regex2 would be:
^(?=regex1$)(?=regex2$)
The $
in each lookahead ensures that each string matches the pattern and nothing more.
Matching when AND
Of course, if you don't want to just check the boolean value of the AND but also do some actual matching, after the lookaheads, you can add a dot-star to consume the string:
^(?=regex1$)(?=regex2$).*
Or... After checking the first condition, just match the second:
^(?=regex1$)regex2$
This is a technique used for instance in password validation. For more details on this, see Mastering Lookahead and Lookbehind.
Bonus section: Union of Regexes
Instead of working on an intersection, let's say you are interested in the union of the following regexes, i.e., a regex that matches either of those regexes:
This is accomplished with the alternation |
operator:
catch|cat1|cat2|cat3|cat5
Furthermore, such a regex can often be compressed, as in:
cat(?:ch|[1-35])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With