As I get it \p{L}
include all letters from Unicode symbols, \p{Alpha}
is slightly the same but only for Latin letters(ASCII). At my work I have 'A' latin and 'A' cyrillic, and \p{Alpha}
in old java code don't match cyrillic symbols as letters. As I test it the \p{L}
is solution for me. Can you folks give me some advice for this situation and what i shoud use in java code? On this page http://www.regular-expressions.info/posixbrackets.html use \p{Alpha}
for java code.
Actually, \p{Alpha}
is a POSIX character class implementation that will match extended characters only when used in combination with UNICODE_CHARACTER_CLASS (or (?U)
flag), while \p{L}
will always match all Unicode letters from the BMP plane. Note you can write \p{L}
as \pL
or \p{IsL}
.
See more reference details:
Both
\p{L}
and\p{IsL}
denote the category of Unicode letters.
POSIX character classes (US-ASCII only)\p{Lower}
A lower-case alphabetic character:[a-z]
\p{Upper}
An upper-case alphabetic character:[A-Z]
\p{Alpha}
An alphabetic character:[\p{Lower}\p{Upper}]
Have a look at the following demo:
String l = "Abc";
String c = "Абв";
System.out.println(l.matches("\\p{Alpha}+")); // => true
System.out.println(c.matches("\\p{Alpha}+")); // => false
System.out.println(c.matches("(?U)\\p{Alpha}+")); // => true
System.out.println(l.matches("\\p{L}+")); // => true
System.out.println(c.matches("\\p{L}+")); // => true
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With