how to replace Latin unicode character to [a-z] characters

Tags:

I'm trying to convert all Latin unicode Character into their [a-z] representations

ó --> o
í --> i

I can easily do one by one for example:

myString = myString.replaceAll("ó","o");

but since there are tons of variations, this approach is just impractical

Is there another way of doing it in Java? for example a regular Expression, or a utility library

USE CASE:

1- city names from another languages into english e.g.

Espírito Santo --> Espirito Santo,

223

asked Sep 22 '15 13:09

nafas

1 Answers

This answer requires Java 1.6 or above, which added java.text.Normalizer.

    String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
    String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

Example:

public class Main {
    public static void main(String[] args) {
        String input = "Árvíztűrő tükörfúrógép";
        System.out.println("Input: " + input);
        String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
        System.out.println("Normalized: " + normalized);
        String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
        System.out.println("Result: " + accentRemoved);
    }
}

Result:

Input: Árvíztűrő tükörfúrógép
Result: Arvizturo tukorfurogep

186

answered Sep 21 '22 23:09

EpicPandaForce

Related questions
                            
                                is invokeAll() a blocking call in java 7
                            
                                Android: Null Pointer Exception when calling new intent
                            
                                ArrayList remove element with index 0 and 1
                            
                                default methods in interface but only static final fields
                            
                                How to Check if an Array's Elements are All Different Java
                            
                                Android Studio Class name green?
                            
                                Dropwizard Shutdown Hook
                            
                                Enhanced for-loop does not accept Iterator
                            
                                Android - combine multiple images into one ImageView
                            
                                new className().methodName(); VS className ref = new className();
                            
                                MaxPermSize Warning in Eclipse WildFly 8 and Java 8
                            
                                Converting a time String to ISO 8601 format
                            
                                Convert stream of Strings to stream of Longs
                            
                                Precision of Java math functions
                            
                                PowerMock will not work with JAXB Unmarshal
                            
                                Combining Spring project and Jersey
                            
                                Capture about, preferences and quit menu items
                            
                                Android Studio error: "Method getText() must be called from the UI Thread, currently inferred thread is worker
                            
                                Is BigInteger Thread safe?
                            
                                Converting a for-loop with continue statement to while-loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to replace Latin unicode character to [a-z] characters

Tags:

java

string

regex

unicode

normalization

nafas

People also ask

1 Answers

EpicPandaForce

Recent Activity

Donate For Us