How do I look up a character or int codepoint in Java using its Unicode name?
For example, if
Character.getName('\u00e4')
returns "LATIN SMALL LETTER A WITH DIAERESIS"
, how do I perform the reverse operation (i.e. go from "LATIN SMALL LETTER A WITH DIAERESIS"
to '\u00e4'
) using "plain" Java?
Edit: To stop the torrent of comments what I want or I don't want, here is what I would do in Python:
"\N{LATIN SMALL LETTER A WITH DIAERESIS}" # this gives me what I want as a literal
unicodedata.lookup("LATIN SMALL LETTER A WITH DIAERESIS") # a dynamic version
Now, the question is: do the same in Java.
And, BTW, I don't want to "print unicode escapes" -- actually getting hex for char is easy, but I want a char bearing given name.
To put it in other words I want to do the reverse of what Character.getName(int)
does.
The ICU4J library can help you here. It has a class UCharacter
with getCharFromName
and other related methods that can map from various types of character name strings back to the int
code points they represent.
However, if you are working with hard coded character names (i.e. quoted string literals in the source code) then it would be far more efficient to do the translation once - use the \u
escape in the source code and add a comment with the full name if necessary - rather than incur the cost of parsing the name tables at runtime every time. If the character names are coming from reading a file or similar then obviously you will have to convert at runtime.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With