Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java change áéőűú to aeouu [duplicate]

Tags:

java

string

Possible Duplicates:
Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
Is there a way to get rid of accents and convert a whole string to regular letters?

How can i do this? Thanks for the help

like image 867
lacas Avatar asked Nov 08 '10 08:11

lacas


3 Answers

I think your question is the same as these:

  • Java - getting rid of accents and converting them to regular letters
  • Converting Java String to ascii

and hence the answer is also the same:

String convertedString =         Normalizer            .normalize(input, Normalizer.Form.NFD)            .replaceAll("[^\\p{ASCII}]", ""); 

See

  • JavaDoc: Normalizer.normalize(String, Normalizer.Form)
  • JavaDoc: Normalizer.Form.NFD
  • Sun Java Tutorial: Normalizer's API)

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ"; System.out.println(     Normalizer         .normalize(input, Normalizer.Form.NFD)         .replaceAll("[^\\p{ASCII}]", "") ); 

Output:

This is a funky String

like image 108
Sean Patrick Floyd Avatar answered Sep 21 '22 19:09

Sean Patrick Floyd


You can use java.text.Normalizer to separate base letters and diacritics, then remove the latter via a regexp:

public static String stripDiacriticas(String s) {     return Normalizer.normalize(s, Form.NFD)         .replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); } 
like image 39
Michael Borgwardt Avatar answered Sep 22 '22 19:09

Michael Borgwardt


First - you shouldn't. These symbols carry special phonetic properties which should not be ignored.

The way to convert them is to create a Map that holds each pair:

Map<Character, Character> map = new HashMap<Character, Character>();
map.put('á', 'a');
map.put('é', 'e');
//etc..

and then loop the chars in the string, creating a new string by calling map.get(currentChar)

like image 45
Bozho Avatar answered Sep 21 '22 19:09

Bozho