I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:
UTF-8:
著者名
Decimal:
著者名
We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.
Unicode character literals To print Unicode characters, enter the escape sequence “u”. Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier.
Get Unicode Character Code in Java Here is definition of char from Oracle: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive). That's all about how to get unicode value of character in java.
I suspect you're just interested in a conversion from char
to int
, which is implicit:
for (int i = 0; i < text.length(); i++)
{
char c = text.charAt(i);
int value = c;
System.out.println(value);
}
EDIT: If you want to handle surrogate pairs, you can use something like:
for (int i = 0; i < text.length(); i++)
{
int codePoint = text.codePointAt(i);
// Skip over the second char in a surrogate pair
if (codePoint > 0xffff)
{
i++;
}
System.out.println(codePoint);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With