Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing accents from String

Tags:

java

string

regex

Recentrly I found very helpful method in StringUtils library which is

StringUtils.stripAccents(String s)

I found it really helpful with removing any special characters and converting it to some ASCII "equivalent", for instace ç=c etc.

Now I am working for a German customer who really needs to do such a thing but only for non-German characters. Any umlauts should stay untouched. I realised that strinAccents won't be useful in that case.

Does anyone has some experience around that stuff? Are there any useful tools/libraries/classes or maybe regular expressions? I tried to write some class which is parsing and replacing such characters but it can be very difficult to build such map for all languages...

Any suggestions appriciated...

like image 307
wojtek Avatar asked Aug 21 '13 07:08

wojtek


1 Answers

Best built a custom function. It can be like the following. If you want to avoid the conversion of a character, you can remove the relationship between the two strings (the constants).

private static final String UNICODE =
        "ÀàÈèÌìÒòÙùÁáÉéÍíÓóÚúÝýÂâÊêÎîÔôÛûŶŷÃãÕõÑñÄäËëÏïÖöÜüŸÿÅåÇçŐőŰű";
private static final String PLAIN_ASCII =
        "AaEeIiOoUuAaEeIiOoUuYyAaEeIiOoUuYyAaOoNnAaEeIiOoUuYyAaCcOoUu";

public static String toAsciiString(String str) {
    if (str == null) {
        return null;
    }
    StringBuilder sb = new StringBuilder();
    for (int index = 0; index < str.length(); index++) {
        char c = str.charAt(index);
        int pos = UNICODE.indexOf(c);
        if (pos > -1)
            sb.append(PLAIN_ASCII.charAt(pos));
        else {
            sb.append(c);
        }
    }
    return sb.toString();
}

public static void main(String[] args) {
    System.out.println(toAsciiString("Höchstalemannisch"));
}
like image 178
Paul Vargas Avatar answered Oct 03 '22 20:10

Paul Vargas