Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get unicode value of a character

Tags:

java

unicode

Is there any way in Java so that I can get Unicode equivalent of any character? e.g.

Suppose a method getUnicode(char c). A call getUnicode('÷') should return \u00f7.

like image 266
Saurabh Avatar asked Feb 08 '10 08:02

Saurabh


People also ask

How do I find Unicode value of a character?

We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.

What is the Unicode value of a string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.

What is the Unicode value of the letter Z?

Unicode Character “Z” (U+005A)


2 Answers

You can do it for any Java char using the one liner here:

System.out.println( "\\u" + Integer.toHexString('÷' | 0x10000).substring(1) ); 

But it's only going to work for the Unicode characters up to Unicode 3.0, which is why I precised you could do it for any Java char.

Because Java was designed way before Unicode 3.1 came and hence Java's char primitive is inadequate to represent Unicode 3.1 and up: there's not a "one Unicode character to one Java char" mapping anymore (instead a monstrous hack is used).

So you really have to check your requirements here: do you need to support Java char or any possible Unicode character?

like image 185
SyntaxT3rr0r Avatar answered Sep 20 '22 06:09

SyntaxT3rr0r


If you have Java 5, use char c = ...; String s = String.format ("\\u%04x", (int)c);

If your source isn't a Unicode character (char) but a String, you must use charAt(index) to get the Unicode character at position index.

Don't use codePointAt(index) because that will return 24bit values (full Unicode) which can't be represented with just 4 hex digits (it needs 6). See the docs for an explanation.

[EDIT] To make it clear: This answer doesn't use Unicode but the method which Java uses to represent Unicode characters (i.e. surrogate pairs) since char is 16bit and Unicode is 24bit. The question should be: "How can I convert char to a 4-digit hex number", since it's not (really) about Unicode.

like image 34
Aaron Digulla Avatar answered Sep 22 '22 06:09

Aaron Digulla