I have a Vietnamese text like this :
String text = "Xin chào Việt Nam";
And I want to convert it to normal text. My expect result :
String result = " "Xin chao Viet Nam";
How can I do that? Thanks.
You're looking for Normalizer in java.text.Normalizer
. It allows you to map between accented Unicode characters and their decompositions:
it basically converts all accented characters into their deAccented counterparts followed by their combining diacritics. Now you can use a regex to strip off the diacritics.
public static void main(String[] args) {
System.out.println(deAccent("Xin chào Việt Nam"));
}
public static String deAccent(String str) {
String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(nfdNormalizedString).replaceAll("");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With