Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relationship between Alnum and IsAlphabetic character classes in Java RegEx patterns

Looking at the Javadoc for java.util.regex.Pattern

\p{Alnum} An alphanumeric character:[\p{IsAlphabetic}\p{IsDigit}]

it appears that every character that matches \p{IsAlphabetic} should also match \p{Alnum}

However, it does not seem to be the case when the character has an accent. For example, the following assertion fails:

assertEquals("é".matches("\\p{IsAlphabetic}+"),"é".matches("\\p{Alnum}+"));

The same thing happens for other characters with accents such as ą, ó, ł, ź ż. All match \p{IsAlphabetic}+ but not \p{Alnum}+

Am I mis-interpreting the Javadoc? Or is this a bug in the documentation or implementation?

like image 665
toniedzwiedz Avatar asked Feb 15 '26 04:02

toniedzwiedz


2 Answers

Your quote from the documentation is fine but you missed to read the line before that table:

The following Predefined Character classes and POSIX character classes are in conformance with the recommendation of Annex C: Compatibility Properties of Unicode Regular Expression, when UNICODE_CHARACTER_CLASS flag is specified.

If you read the documentation page you referenced, you will see that \p{Alnum} = [\p{Alpha}\p{Digit}] and \p{Alpha} = [\p{Lower}\p{Upper}] and \p{Lower} = [a-z] and \p{Upper} = [A-Z].

So, \p{Alnum} only matches ASCII letters (and digits) when UNICODE_CHARACTER_CLASS flag is not set while \p{L} (=\p{IsAlphabetic}) matches all Unicode letters by default (no flag is necessary).

like image 79
Wiktor Stribiżew Avatar answered Feb 17 '26 16:02

Wiktor Stribiżew


By default \p{Alnum} is treated as a POSIX character class which means it will only ever match ASCII characters. This means it will match a and 1 but not ä or ١.

The passage you quote only applies when the UNICODE_CHARACTER_CLASS flag is used.

Slightly oversimplified, this flag will turn the "old" POSIX style character classes into their equivalent Unicode character classes.

like image 30
Joachim Sauer Avatar answered Feb 17 '26 17:02

Joachim Sauer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!