How do I detect unicode characters in a Java string?

Tags:

Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that?

For example, given the string "AÜXÜ", I'd like to transform it to "AYXY". I'd like to do the same for other unicode characters, and I would hate to have to store them in a translation map of some sort.

886

asked Nov 04 '09 12:11

Geo

2 Answers

You could loop through your string and for every character call

If (Character.UnicodeBlock.of(c) != Character.UnicodeBlock.BASIC_LATIN) {  // replace with Y }

answered Sep 24 '22 12:09

jitter

The definition of "unicode characters" is vague, but will be taken to mean UTF-8 characters not covered by the standard ISO 8859 charset. If this is true in your case, then loop through all characters in the String and test its codepoint to determine whether it is within the given character set.

Alternatively, use a Map<Character, Character> and characters in the map that contain match the keys. For example:

Map<Character, Character> charReplacementMap = new HashMap<Character, Character>() {{     put('Ü', 'Y');     // Put more here. }};  String originalString = "AÜAÜ"; StringBuilder builder = new StringBuilder();  for (char currentChar : originalString.toCharArray()) {     Character replacementChar = charReplacementMap.get(currentChar);     builder.append(replacementChar != null ? replacementChar : currentChar); }  String newString = builder.toString();

Or, do you mean "all characters with diacritics"? If so, then use java.text.Normalizer to remove diacritical marks:

/**  * Remove any diacritical marks (accents like ç, ñ, é, etc) from  * the given string (so that it returns plain c, n, e, etc).  * @param string The string to remove diacritical marks from.  * @return The string with removed diacritical marks, if any.  */ public static String removeDiacriticalMarks(String string) {     return Normalizer.normalize(string, Form.NFD)         .replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); }

One pitfall, Ü would become U, not Y. Not sure if that's what you're after. If you want to replace by pronounced character, you'll really need to create a mapping. Sure, it's a tedious work, but it's done in less time than you needed to follow this topic.

answered Sep 24 '22 12:09

BalusC

Related questions
                            
                                handling function key press
                            
                                Applying Multiple Window Functions On Same Partition
                            
                                Read numbers from a text file in C#
                            
                                How to change System.Windows.Forms.ToolStripButton highlight/background color when checked?
                            
                                Converting dates in AWK
                            
                                What does ifstream::rdbuf() actually do?
                            
                                SQL Server: table variable used in a inner join
                            
                                Clear session cookies with Selenium IDE?
                            
                                Invoking particular action on dropdown list selection in MVC
                            
                                Is it possible to change the Font type and size in UITableView?
                            
                                How to find all the browsers installed on a machine
                            
                                Is it possible for BeautifulSoup to work in a case-insensitive manner?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With