I need to create a Pattern
that will match all Unicode digits and alphabetic characters. So far I have "\\p{IsAlphabetic}|[0-9]"
.
The first part is working well for me, it's doing a good job of identifying non-Latin characters as alphabetic characters. The problem is the second half. Obviously it will only work for Arabic Numerals. The character classes \\d
and \p{Digit}
are also just [0-9]
. The javadoc for Pattern
does not seem to mention a character class for Unicode digits. Does anyone have a good solution for this problem?
For my purposes, I would accept a way to match the set of all characters for which Character.isDigit
returns true
.
Quoting the Java docs about isDigit
:
A character is a digit if its general category type, provided by getType(codePoint), is DECIMAL_DIGIT_NUMBER.
So, I believe the pattern to match digits should be \p{Nd}
.
Here's a working example at ideone. As you can see, the results are consistent between Pattern.matches
and Character.isDigit
.
Use \d
, but with the (?U)
flag to enable the Unicode version of predefined character classes and POSIX character classes:
(?U)\d+
or in code:
System.out.println("3๓३".matches("(?U)\\d+")); // true
Using (?U)
is equivalent to compiling the regex by calling Pattern.compile()
with the UNICODE_CHARACTER_CLASS
flag:
Pattern pattern = Pattern.compile("\\d", Pattern.UNICODE_CHARACTER_CLASS);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With