Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accent in Regular Expression in Java

I'd like to use Hibernate Validator to validate some columns. The problem, as I understand, is that the \w marker in java doesn't accept letters with accents on them.

Is there any way that I could write the regexp so that words like Relatório could be validated (i wouldn't want to write all letters with accents between brackets, because I expect to be writing this regexp in a lot of columns)?

like image 938
Rafael Avatar asked Apr 20 '11 16:04

Rafael


People also ask

What is the use of \\ s+ in Java?

Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

How do you change character to accent in Java?

replaceAll("[^\\p{ASCII}]", ""); If your text is in unicode, you should use this instead: string = string. replaceAll("\\p{M}", "");

What does \\ mean in Java regex?

Backslashes in Java. The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.


2 Answers

The Java regex documentation has a section on Unicode categories (search for "Classes for Unicode blocks and categories"). If you're just looking for letters, I think \p{L} is the category you want.

like image 122
Rachel Shallit Avatar answered Oct 06 '22 10:10

Rachel Shallit


I had more luck with:

\p{InCombiningDiacriticalMarks}+

In java I use the following method:

import java.text.Normalizer;
import java.text.Normalizer.Form;

public static String removeAccents(String text) {
    return text == null ? null :
        Normalizer.normalize(text, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
like image 37
Havnar Avatar answered Oct 06 '22 10:10

Havnar