Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAVA: get UTF-8 Hex values from a string?

I would like to be able to convert a raw UTF-8 string to a Hex string. In the example below I've created a sample UTF-8 string containing 2 letters. Then I'm trying to get the Hex values but it gives me negative values.

How can I make it give me 05D0 and 05D1

String a = "\u05D0\u05D1";
byte[] xxx = a.getBytes("UTF-8");

for (byte x : xxx) {
   System.out.println(Integer.toHexString(x));
}

Thank you.

like image 695
thedp Avatar asked Mar 14 '12 17:03

thedp


2 Answers

Don't convert to an encoding like UTF-8 if you want the code point. Use Character.codePointAt.

For example:

Character.codePointAt("\u05D0\u05D1", 0) // returns 1488, or 0x5d0
like image 87
ataylor Avatar answered Sep 16 '22 22:09

ataylor


Negative values occur because the range of byte is from -128 to 127. The following code will produce positive values:

String a = "\u05D0\u05D1";
byte[] xxx = a.getBytes("UTF-8");

for (byte x : xxx) {
    System.out.println(Integer.toHexString(x & 0xFF));
}

The main difference is that it outputs x & 0xFF instead of just x, this operation converts byte to int, dropping the sign.

like image 20
Malcolm Avatar answered Sep 17 '22 22:09

Malcolm