Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get character by its (unicode) name in Java? I need the reverse of Character.getName(int codePoint)

Tags:

java

unicode

How do I look up a character or int codepoint in Java using its Unicode name?

For example, if

Character.getName('\u00e4')

returns "LATIN SMALL LETTER A WITH DIAERESIS", how do I perform the reverse operation (i.e. go from "LATIN SMALL LETTER A WITH DIAERESIS" to '\u00e4') using "plain" Java?

Edit: To stop the torrent of comments what I want or I don't want, here is what I would do in Python:

"\N{LATIN SMALL LETTER A WITH DIAERESIS}" # this gives me what I want as a literal

unicodedata.lookup("LATIN SMALL LETTER A WITH DIAERESIS") # a dynamic version

Now, the question is: do the same in Java.

And, BTW, I don't want to "print unicode escapes" -- actually getting hex for char is easy, but I want a char bearing given name.

To put it in other words I want to do the reverse of what Character.getName(int) does.

like image 884
Piotr Findeisen Avatar asked May 15 '14 07:05

Piotr Findeisen


1 Answers

The ICU4J library can help you here. It has a class UCharacter with getCharFromName and other related methods that can map from various types of character name strings back to the int code points they represent.

However, if you are working with hard coded character names (i.e. quoted string literals in the source code) then it would be far more efficient to do the translation once - use the \u escape in the source code and add a comment with the full name if necessary - rather than incur the cost of parsing the name tables at runtime every time. If the character names are coming from reading a file or similar then obviously you will have to convert at runtime.

like image 156
Ian Roberts Avatar answered Oct 02 '22 00:10

Ian Roberts