Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Vietnamese text to normal text?

Tags:

java

android

I have a Vietnamese text like this :

String text = "Xin chào Việt Nam";

And I want to convert it to normal text. My expect result :

String result = " "Xin chao Viet Nam";

How can I do that? Thanks.

like image 316
CauCuKien Avatar asked Feb 07 '23 02:02

CauCuKien


1 Answers

You're looking for Normalizer in java.text.Normalizer . It allows you to map between accented Unicode characters and their decompositions:
it basically converts all accented characters into their deAccented counterparts followed by their combining diacritics. Now you can use a regex to strip off the diacritics.

        public static void main(String[] args) {

            System.out.println(deAccent("Xin chào Việt Nam"));
        }

        public static String deAccent(String str) {
            String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
            Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
            return pattern.matcher(nfdNormalizedString).replaceAll("");
        }
like image 189
Ahmed Gamal Avatar answered Feb 09 '23 14:02

Ahmed Gamal