Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get the decimal value of a unicode character in Java?

Tags:

java

unicode

I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:

UTF-8:

著者名

Decimal:

著者名
like image 433
Mike Sickler Avatar asked Jul 20 '11 18:07

Mike Sickler


People also ask

How do I find Unicode value of a character?

We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.

How do you specify Unicode characters in Java?

Unicode character literals To print Unicode characters, enter the escape sequence “u”. Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier.

What is the Unicode value of a character in Java?

Get Unicode Character Code in Java Here is definition of char from Oracle: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive). That's all about how to get unicode value of character in java.


1 Answers

I suspect you're just interested in a conversion from char to int, which is implicit:

for (int i = 0; i < text.length(); i++)
{
    char c = text.charAt(i);
    int value = c;
    System.out.println(value);
}

EDIT: If you want to handle surrogate pairs, you can use something like:

for (int i = 0; i < text.length(); i++)
{
    int codePoint = text.codePointAt(i);
    // Skip over the second char in a surrogate pair
    if (codePoint > 0xffff)
    {
        i++;
    }
    System.out.println(codePoint);
}
like image 155
Jon Skeet Avatar answered Nov 04 '22 14:11

Jon Skeet