Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a character with different characters depending on which character it is

I have searched SO (and Google) but not found any fully matching answer to my question:

I want to replace all swedish characters and whitespace in a String with another character. I would like it to work as follows:

  • "å" and "ä" should be replaced with "a"
  • "ö" should be replaced with "o"
  • "Å" and "Ä" should be replace with "A"
  • "Ö" should be replaced with "O"
  • " " should be replaced with "-"

Can this be achieved with regex (or any other way), and if so, how?

Of course, the below method does the job (and can be improved, I know, by replacing for example "å" and "ä" on the same line):

private String changeSwedishCharactersAndWhitespace(String string) {
    String newString = string.replaceAll("å", "a");
    newString = string.replaceAll("ä", "a");
    newString = string.replaceAll("ö", "o");
    newString = string.replaceAll("Å", "A");
    newString = string.replaceAll("Ä", "A");
    newString = string.replaceAll("Ö", "O");
    newString = string.replaceAll(" ", "-");
    return newString;
}

I know how to use regex to replace, for example, all "å", "ä", or "ö" with "". The question is how do I replace a character using regex with another depending on which character it is? There must surely be a better way using regex than the above aproach?

like image 997
Magnilex Avatar asked Nov 15 '12 11:11

Magnilex


3 Answers

For latin characters with diacritics, a unicode normalization (java text) to retrieve basic letter code + diacritic combining code might help. Something like:

import java.text.Normalizer;
newString = Normalizer.normalize(string,
        Normalizer.Form.NFKD).replaceAll("\\p{M}", "");
like image 92
Joop Eggen Avatar answered Nov 15 '22 13:11

Joop Eggen


You can use StringUtils.replaceEach, like this:

private String changeSwedishCharactersAndWhitespace(String string) {
    String newString = StringUtils.replaceEach (string, 
      new String[] {"å", "ä", "ö", "Å", "Ä", "Ö", " "}, 
      new String[] {"a", "a", "o", "A", "A", "O", "-"});
    return newString;
}
like image 42
ShyJ Avatar answered Nov 15 '22 12:11

ShyJ


I think there is not a common regex for replacing these characters at once. Apart from that, you can facilitate your replacement work by using a HashMap.

HashMap<String, String> map = new HashMap<String, String>()
                              {{put("ä", "a"); /*put others*/}};

for (Map.Entry<String, String> entry : map.entrySet())
    newString = string.replaceAll(entry.getKey(), entry.getValue());
like image 25
Juvanis Avatar answered Nov 15 '22 13:11

Juvanis