Here's my current code:
return str.matches("^[A-Za-z\\-'. ]+");
I want it to include international letters. How do I do that in Java?
Thanks.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."
Regex support is part of the standard library of many programming languages, including Java and Python, and is built into the syntax of others, including Perl and ECMAScript.
It seems that you want is, to match all the alphabetic characters. Typically you would do that by using Posix \p{Alpha}
expression, extended by the punctuation you want also to permit. As Java Regular Expressions documentation says, it matches ASCII only.
However, what documentation does not say clearly is, you can make this class work with Unicode characters. To do just that you need to turn Unicode character class matching on.
You can do this in one of two ways:
Pattern
object passing the UNICODE_CHARACTER_CLASS
constant:Pattern p = Pattern.compile("^[p{Alpha}\\-'. ]+", UNICODE_CHARACTER_CLASS);
(?U)
embedded pattern flag:str.matches("^(?U)[\\p{Alpha}\\-'. ]+");
Prove of concept:
String[] test = {"Jean-Marie Le'Blanc", "Żółć", "Ὀδυσσεύς", "原田雅彦"};
for (String str : test) {
System.out.print(str.matches("^(?U)[\\p{Alpha}\\-'. ]+") + " ");
}
The obvious result is:
true true true true
If you think that all is correct, I have two additional points to make:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With