Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignoring diacritic characters when comparing words with special characters (é, è, ...)

I have a list with some Belgian cities with diacritic characters: (Liège, Quiévrain, Franière, etc.) and I would like to transform these special characters to compare with a list containing the same names in upper case, but without the diacritical marks (LIEGE, QUIEVRAIN, FRANIERE)

What i first tried to do was to use the upper case:

LIEGE.contentEqual(Liège.toUpperCase()) but that doesn't fit because the Upper case of Liège is LIÈGE and not LIEGE.

I have some complicated ideas like replacing each character, but that sounds stupid and a long process.

Any ideas on how to do this in a smart way?

like image 619
Waza_Be Avatar asked Jul 09 '10 11:07

Waza_Be


1 Answers

As of Java 6, you can use java.text.Normalizer:

public String unaccent(String s) {
    String normalized = Normalizer.normalize(s, Normalizer.Form.NFD);
    return normalized.replaceAll("[^\\p{ASCII}]", "");
}

Note that in Java 5 there is also a sun.text.Normalizer, but its use is strongly discouraged since it's part of Sun's proprietary API and has been removed in Java 6.

like image 52
Stijn Van Bael Avatar answered Sep 25 '22 01:09

Stijn Van Bael