...when used in patterns like "\\p{someCharacterClass}"
.
I've used/seen some:
What is the definitive list of all supported built-in character classed? Where is it documented? What are the exact meanings?
There seem to be a lot of "RTFM" answers refering to the javadoc for Pattern
. That's the first place I looked before asking this question. Just so everyone is clear, the javadoc for Pattern makes no mention of any of the classes listed above.
The "correct" answer will mention "InCombiningDiacriticalMarks" somewhere on the page, and will not be some vague reference to "Unicode Standards".
Java does not have a built-in Regular Expression class, but we can import the java.util.regex package to work with regular expressions. The package includes the following classes: Pattern Class - Defines a pattern (to be used in a search) Matcher Class - Used to search for the pattern.
In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.
The java. util. regex package provides following classes and interfaces for regular expressions.
The documentation for Pattern
says in the "Unicode Support" section:
The supported categories are those of The Unicode Standard in the version specified by the Character class. The category names are those defined in the Standard, both normative and informative. The block names supported by Pattern are the valid block names accepted and defined by
UnicodeBlock.forName
.
The documentation for UnicodeBlock.forName
states:
Block names are determined by The Unicode Standard.
On http://unicode.org there is the FAQ Where can I find the definitive list of Unicode blocks?:
A: The Unicode blocks and their names are a normative part of the Unicode Standard. The exact list is always maintained in one of the files of the Unicode Character Database,
Blocks.txt
.
Finally, in Blocks.txt
there is the line:
0300..036F; Combining Diacritical Marks
These characters can be found in the Combining Diacritical Marks code chart (from Unicode 6.0 Character Code Charts).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With