Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that?
For example, given the string "AÜXÜ", I'd like to transform it to "AYXY". I'd like to do the same for other unicode characters, and I would hate to have to store them in a translation map of some sort.
To check if a given String contains only unicode letters, digits or space, we use the isLetterOrDigit() and charAt() methods with decision making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.
To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.
You can use string. indexOf('a') . If the char a is present in string : it returns the the index of the first occurrence of the character in the character sequence represented by this object, or -1 if the character does not occur.
To check if a string has any non-ASCII characters in it with JavaScript, we can check with a regex. to use the /^[\u0000-\u007f]*$/ regex to check if any characters in str and `str2 have only ASCII characters. ASCII characters have codes ranging from u+0000 to u+007f.
You could loop through your string and for every character call
If (Character.UnicodeBlock.of(c) != Character.UnicodeBlock.BASIC_LATIN) { // replace with Y }
The definition of "unicode characters" is vague, but will be taken to mean UTF-8 characters not covered by the standard ISO 8859 charset. If this is true in your case, then loop through all characters in the String and test its codepoint to determine whether it is within the given character set.
Alternatively, use a Map<Character, Character>
and characters in the map that contain match the keys. For example:
Map<Character, Character> charReplacementMap = new HashMap<Character, Character>() {{ put('Ü', 'Y'); // Put more here. }}; String originalString = "AÜAÜ"; StringBuilder builder = new StringBuilder(); for (char currentChar : originalString.toCharArray()) { Character replacementChar = charReplacementMap.get(currentChar); builder.append(replacementChar != null ? replacementChar : currentChar); } String newString = builder.toString();
Or, do you mean "all characters with diacritics"? If so, then use java.text.Normalizer
to remove diacritical marks:
/** * Remove any diacritical marks (accents like ç, ñ, é, etc) from * the given string (so that it returns plain c, n, e, etc). * @param string The string to remove diacritical marks from. * @return The string with removed diacritical marks, if any. */ public static String removeDiacriticalMarks(String string) { return Normalizer.normalize(string, Form.NFD) .replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); }
One pitfall, Ü would become U, not Y. Not sure if that's what you're after. If you want to replace by pronounced character, you'll really need to create a mapping. Sure, it's a tedious work, but it's done in less time than you needed to follow this topic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With