Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern objects not matching with different languages

Tags:

java

regex

I have the following reg expression that works fine when the user's inputs English. But it always fails when using Portuguese characters.

Pattern p = Pattern.compile("^[a-zA-Z]*$");
Matcher matcher = p.matcher(fieldName);

if (!matcher.matches())
{
   ....
}

Is there any way to get the pattern object to recognise valid Portuguese characters such as ÁÂÃÀÇÉÊÍÓÔÕÚç....?

Thanks

like image 838
Thomas Buckley Avatar asked Dec 13 '22 06:12

Thomas Buckley


1 Answers

You want a regular expression that will match the class of all alphabetic letters. Across all the scripts of the world, there's loads of those, but luckily we can tell Java 6's RE engine that we're after a letter and it will use the magic of Unicode classes to do the rest. In particular, the L class matches all types of letters, upper, lower and “oh, that concept doesn't apply in my language”:

Pattern p = Pattern.compile("^\\p{L}*$");
// the rest is identical, so won't repeat it...

When reading the docs, remember that backslashes will need to be doubled up if placed in a Java literal so as to stop the Java compiler from interpreting them as something else. (Also be aware that that RE is not suitable for things like validating the names of people, which is an entirely different and much more difficult problem.)

like image 130
Donal Fellows Avatar answered Dec 14 '22 21:12

Donal Fellows