import java.io.UnsupportedEncodingException;
import java.util.Arrays;
public class Main {
public static void main(String[] args)
{
try
{
String s = "s";
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
}
}
Console:
[115]
[-2, -1, 0, 115]
[0, 0, 0, 115]
What is it?
[-2, -1] - ???
Also, i noted, that if i do that:
String s = new String(new char[]{'\u1251'});
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );
Console:
[-31, -119, -111]
[-2, -1, 18, 81]
[0, 0, 18, 81]
getbytes() function in java is used to convert a string into a sequence of bytes and returns an array of bytes. Syntax: public byte[] getBytes()
One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.
There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.
Don't forget that bytes are unsigned in Java. So -2, -1 really means 0xfe 0xff... and U+FEFF is the Unicode byte order mark (BOM)... that's what you're seeing here in the UTF-16 version.
To avoid getting the BOM when encoding, use UTF-16BE or UTF-16LE explicitly. (I would also suggest using the names which are guaranteed by the platform rather than just "utf8" etc. Admittedly the name is guaranteed to be found case-insensitively, but the lack of a hyphen makes it less reliable, and there are no downsides to using the canonical name.)
The -2, -1 is a Byte Order Mark (BOM - U+FEFF) that indcates that the following text is encoded in UTF-16 format.
You are probably getting this because, while there is only one UTF8 and UTF32 encoding, there are two UTF16 encodings UTF16LE and UTF16BE, where the 2 bytes in the 16-bit value are stored in Big-Endian or Little Endian format.
As the values that come back are 0xFE xFF, this suggests that the encoding is UTF16BE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With