I have the following regex:
String regExpression = "^[a-zA-Z0-9+,. '-]{1,"+maxCharacters+"}$";
which works fine for me, except, it doesn't allow any UTF-8 diacritics in it(Ă ă Â â Î î Ș ș Ț ț).
I only need my current regex to accept diacritics in it besides what it already does.
Any help is appreciated. Thanks.
You need to look into the POSIX character classes to catch those. Sadly Java Regex don't support language specific POSIX classes but maybe \p{Graph} A visible character: [\p{Alnum}\p{Punct}]
or \p{Print} A printable character: [\p{Graph}\x20]
will fit.
Best fit as suggested by Sorin probably is \p{L}
(Letter).
import java.util.regex.Pattern;
public class Regexer {
public static void main(String[] args) {
int maxCharacters = 100;
String data = "Ă ă Â â Î î Ș ș Ț ț";
String pattern = "^[\\p{L}0-9+,. '-]{1," + maxCharacters + "}$";
Pattern p = Pattern.compile(pattern);
if (p.matcher(data).matches()) {
System.out.println("Hit");
} else {
System.out.println("No");
}
}
}
This works for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With