unknown bytes is returned by method getBytes()



import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class Main {
 public static void main(String[] args)
 {
  try 
  {
   String s = "s";
   System.out.println( Arrays.toString( s.getBytes("utf8") ) );
   System.out.println( Arrays.toString( s.getBytes("utf16") ) );
   System.out.println( Arrays.toString( s.getBytes("utf32") ) );
  }  
  catch (UnsupportedEncodingException e) 
  {
   e.printStackTrace();
  }
 }
}

Console:


[115]
[-2, -1, 0, 115]
[0, 0, 0, 115]

What is it?

[-2, -1] - ???

Also, i noted, that if i do that:


String s = new String(new char[]{'\u1251'});
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );

Console:


[-31, -119, -111]
[-2, -1, 18, 81]
[0, 0, 18, 81]

What does getBytes return?

getbytes() function in java is used to convert a string into a sequence of bytes and returns an array of bytes. Syntax: public byte[] getBytes()

How do you convert bytes to strings?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.

How do you convert a byte array into a string?

There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.

Don't forget that bytes are unsigned in Java. So -2, -1 really means 0xfe 0xff... and U+FEFF is the Unicode byte order mark (BOM)... that's what you're seeing here in the UTF-16 version.

To avoid getting the BOM when encoding, use UTF-16BE or UTF-16LE explicitly. (I would also suggest using the names which are guaranteed by the platform rather than just "utf8" etc. Admittedly the name is guaranteed to be found case-insensitively, but the lack of a hyphen makes it less reliable, and there are no downsides to using the canonical name.)

The -2, -1 is a Byte Order Mark (BOM - U+FEFF) that indcates that the following text is encoded in UTF-16 format.

You are probably getting this because, while there is only one UTF8 and UTF32 encoding, there are two UTF16 encodings UTF16LE and UTF16BE, where the 2 bytes in the 16-bit value are stored in Big-Endian or Little Endian format.

As the values that come back are 0xFE xFF, this suggests that the encoding is UTF16BE

unknown bytes is returned by method getBytes()

Tags:

java

string

unicode

mr. Vachovsky

People also ask

2 Answers

Jon Skeet

Simon Callan

Recent Activity

Donate For Us

unknown bytes is returned by method getBytes()

Tags:

java

string

unicode

mr. Vachovsky

People also ask

2 Answers

Jon Skeet

Simon Callan

Related questions

Recent Activity

Donate For Us