Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can regex match intersection between two regular expressions?

Tags:

regex

Given several regular expressions, can we write a regular expressions which is equal to their intersection?

For example, given two regular expressions c[a-z][a-z] and [a-z][aeiou]t, their intersection contains cat and cut and possibly more. How can we write a regular expression for their intersection?

Thanks.

like image 269
Tim Avatar asked Jun 08 '14 02:06

Tim


People also ask

Can you use intersection in regular expressions?

Intersection is not a part of standard regular expressions.

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Is a regular expression closed under intersection?

Regular Languages are closed under intersection, i.e., if L1 and L2 are regular then L1 ∩ L2 is also regular.


1 Answers

First, let's agree on terms. My syntactical assumption will be that

The intersection of several regexes is one regex that matches strings that each of the component regexes also match.

The General Option

To check for the intersection of two patterns, the general method is (pseudo-code):

if match(regex1) && match(regex2) { champagne for everyone! }

The Regex Option

In some cases, you can do the same with lookaheads, but for a complex regex there is little benefit of doing so, apart from making your regex more obscure to your enemies. Why little benefit? Because the engine will have to parse the whole string multiple times anyway.

Boolean AND

The general pattern for an AND checking that a string exactly meets regex1 and regex2 would be:

^(?=regex1$)(?=regex2$)

The $ in each lookahead ensures that each string matches the pattern and nothing more.

Matching when AND

Of course, if you don't want to just check the boolean value of the AND but also do some actual matching, after the lookaheads, you can add a dot-star to consume the string:

^(?=regex1$)(?=regex2$).*

Or... After checking the first condition, just match the second:

^(?=regex1$)regex2$

This is a technique used for instance in password validation. For more details on this, see Mastering Lookahead and Lookbehind.

Bonus section: Union of Regexes

Instead of working on an intersection, let's say you are interested in the union of the following regexes, i.e., a regex that matches either of those regexes:

  1. catch
  2. cat1
  3. cat2
  4. cat3
  5. cat5

This is accomplished with the alternation | operator:

catch|cat1|cat2|cat3|cat5

Furthermore, such a regex can often be compressed, as in:

cat(?:ch|[1-35]) 
like image 120
zx81 Avatar answered Oct 19 '22 21:10

zx81