Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte array to String and back.. issues with -127

Tags:

java

scala

In the following:

 scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes  res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63) 

why is -127 converted to 63? and how do I get it back as -127

[EDIT:] Java version below (to show that its not just a "Scala problem")

c:\tmp>type Main.java public class Main {     public static void main(String [] args) {         byte [] b = {1, 2, 3, -1, -2, -127};         byte [] c = new String(b).getBytes();         for (int i = 0; i < 6; i++){             System.out.println("b:"+b[i]+"; c:"+c[i]);         }     } } c:\tmp>javac Main.java c:\tmp>java Main b:1; c:1 b:2; c:2 b:3; c:3 b:-1; c:-1 b:-2; c:-2 b:-127; c:63 
like image 230
Jus12 Avatar asked Mar 09 '11 18:03

Jus12


People also ask

Can we convert byte to string?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.

How does java convert byte array to string?

There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.


2 Answers

The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset). What you want is to use no decoding at all.

Fortunately, there's a constructor for that: String(char[] value).

Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset) That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray() method.

If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:

(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte) 

So, to summarize: converting between String and Array[Byte] involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String is expected to be), and so you'd better read it out as characters and convert it back to bytes.

You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).

like image 182
Rex Kerr Avatar answered Oct 08 '22 11:10

Rex Kerr


String is for storing text not binary data.

In your default character encoding there is no charcter for -127 so it replaces it with '?' or 63.

EDIT: Base64 is the best option, even better would be to not use text to store binary data. It can be done, but not with any standard character encoding. i.e. you have to do the encoding yourself.

To answer your question literally, you can use your own character encoding. This is a very bad idea as any text is likely to get encoded and mangled in the same way as you have seen. Using Base64 avoids this by using characters which are safe in any encoding.

byte[] bytes = new byte[256]; for (int i = 0; i < bytes.length; i++)     bytes[i] = (byte) i; String text = new String(bytes, 0); byte[] bytes2 = new byte[text.length()]; for (int i = 0; i < bytes2.length; i++)     bytes2[i] = (byte) text.charAt(i); int count = 0; for (int i = 0; i < bytes2.length; i++)     if (bytes2[i] != (byte) i)         System.out.println(i);     else         count++; System.out.println(count + " bytes matched."); 
like image 20
Peter Lawrey Avatar answered Oct 08 '22 11:10

Peter Lawrey