Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-16BE to UTF-16LE, and back

I have a Blackberry project that I'm working on and I need to convert byte arrays of strings encoded using UTF-16LE (little endian) to a byte array of string in the UTF-16BE (big endian) encoding, and vis. versa. A server I'm connecting to is sending the BlackBerry device byte arrays of strings in the UTF-16LE encoding however the device doesn't natively support UTF-16LE. When I try to decode the byte arrays back into strings, the strings are illegible. The device does, however, support UTF-16BE. I also need to reverse this process, i.e. convert a byte array of a string with UTF-16BE encoding into the what the server is expecting (UTF-16LE). Thanks.

I cannot do this on the device:

String test = "test";
byte[] testBytes = test.getBytes("UTF-16LE");// throws UnsupportedEncodingException

I can do this:

String test = "test";
byte[] testBytes = test.getBytes("UTF-16BE");//works
like image 626
RyanM Avatar asked Aug 24 '12 01:08

RyanM


1 Answers

UTF-16 uses two bytes per codeunit, with some Unicode codepoints encoded using one codeunit and other codepoints using two codeunits (called a surrogate pair).

To convert between UTF-16LE and UTF-16BE, simply loop through the bytes swapping the order of each 2-byte pair of each codeunit. The order of surrogate codeunits does not change between LE and BE. IOW, simply swap bytes 0 and 1 with each other, swap bytes 2 and 3 with each other, and so on.

like image 172
Remy Lebeau Avatar answered Sep 22 '22 14:09

Remy Lebeau