Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between \p{Alpha} and \p{L} in Java

Tags:

java

regex

As I get it \p{L} include all letters from Unicode symbols, \p{Alpha} is slightly the same but only for Latin letters(ASCII). At my work I have 'A' latin and 'A' cyrillic, and \p{Alpha} in old java code don't match cyrillic symbols as letters. As I test it the \p{L} is solution for me. Can you folks give me some advice for this situation and what i shoud use in java code? On this page http://www.regular-expressions.info/posixbrackets.html use \p{Alpha} for java code.

like image 928
Марат Кравченко Avatar asked Dec 27 '15 12:12

Марат Кравченко


1 Answers

Actually, \p{Alpha} is a POSIX character class implementation that will match extended characters only when used in combination with UNICODE_CHARACTER_CLASS (or (?U) flag), while \p{L} will always match all Unicode letters from the BMP plane. Note you can write \p{L} as \pL or \p{IsL}.

See more reference details:

Both \p{L} and \p{IsL} denote the category of Unicode letters.

POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]

Have a look at the following demo:

String l = "Abc";
String c = "Абв";
System.out.println(l.matches("\\p{Alpha}+"));     // => true
System.out.println(c.matches("\\p{Alpha}+"));     // => false
System.out.println(c.matches("(?U)\\p{Alpha}+")); // => true
System.out.println(l.matches("\\p{L}+"));         // => true
System.out.println(c.matches("\\p{L}+"));         // => true
like image 116
Wiktor Stribiżew Avatar answered Nov 20 '22 07:11

Wiktor Stribiżew