I have taken from this oracle tutorial on java regex, the following bit:
Intersections
To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.
Enter your regex: [0-9&&[345]] Enter input string to search: 3 I found the text "3" starting at index 0 and ending at index 1.
Why would it be useful? I mean if one wants to pattern only 345 why not only [345] instead of "the intersection"?
Thanks in advance.
Let us consider a simple problem: match English consonants in a string. Listing out all consonants (or a list of ranges) would be one way:
[B-DF-HJ-NP-TV-Zb-df-hj-np-tv-z]
Another way is to use look-around:
(?=[A-Za-z])[^AEIOUaeiou]
(?![AEIOUaeiou])[A-Za-z]
Not sure if there is any other way to do this without the use of character class intersection.
Character class intersection solution (Java):
[A-Za-z&&[^AEIOUaeiou]]
For .NET, there is no intersection, but there is character class subtraction:
[A-Za-z-[AEIOUaeiou]]
I don't know the implementation details, but I wouldn't be surprised if character class intersection/subtraction is faster than the use of look-around, which is the cleanest alternative if character class operation is not available.
Another possible usage is when you have a pre-built character class and you want to remove some characters from it. One case that I have come across where class intersection might be applicable would be to match all whitespace characters, except for new line.
Another possible use case as @beerbajay has commented:
I think the built-in character classes are the main use case, e.g.
[\p{InGreek}&&\p{Ll}]
for lowercase Greek letters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With