Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the point behind character class intersections in Java's Regex?

Tags:

java

regex

Java's Regex.Pattern supports the following character class:

[a-z&&[def]]

which matches "d, e, or f" and is called an intersection.

Functionally this is no different from:

[def]

which is simpler to read and understand in a big RE. So my question is, what use are intersections, other than specifying complete support for CSG-like operations on character classes?

(Please note, I understand the utility of subtractions like [a-z&&[^bc]] and [a-z&&[^m-p]], I am asking specifically about intersections as presented above.)

like image 430
Christopher Avatar asked Jul 09 '09 20:07

Christopher


People also ask

What are character classes in regex?

In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.

What does \\ mean in Java regex?

Backslashes in Java. The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.

What is the difference between Dot character and character class in regular expression with example?

Character classes match any symbol from certain character sets e.g., \d , \s , and \w . The character classes \d , \s , and \w have the inverse classes \D , \S and \W that match other characters except \d , \s and \w . The dot( . ) matches any character except the newline character.

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .


2 Answers

Though I've never had the need to do so, I could imagine a use with pre-defined character classes that aren't proper subsets of each other (thus making the intersection produce something different than the original two character classes). E.g. matching only lower case Latin characters:

[\p{Ll}&&\p{InBasicLatin}]
like image 129
iammichael Avatar answered Sep 22 '22 16:09

iammichael


I believe that particular sample is just a "proof of concept." Two intersected character classes only match a character that matches both character sets individually. The substractions you mentioned are the real practical applications of the operator.

Simply put, there is no hidden meaning.

like image 39
Blixt Avatar answered Sep 23 '22 16:09

Blixt