In the following:
scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)
why is -127 converted to 63? and how do I get it back as -127
[EDIT:] Java version below (to show that its not just a "Scala problem")
c:\tmp>type Main.java public class Main { public static void main(String [] args) { byte [] b = {1, 2, 3, -1, -2, -127}; byte [] c = new String(b).getBytes(); for (int i = 0; i < 6; i++){ System.out.println("b:"+b[i]+"; c:"+c[i]); } } } c:\tmp>javac Main.java c:\tmp>java Main b:1; c:1 b:2; c:2 b:3; c:3 b:-1; c:-1 b:-2; c:-2 b:-127; c:63
One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.
There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.
The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset)
. What you want is to use no decoding at all.
Fortunately, there's a constructor for that: String(char[] value)
.
Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset)
That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray()
method.
If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:
(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)
So, to summarize: converting between String
and Array[Byte]
involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String
is expected to be), and so you'd better read it out as characters and convert it back to bytes.
You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char
code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).
String is for storing text not binary data.
In your default character encoding there is no charcter for -127 so it replaces it with '?' or 63.
EDIT: Base64 is the best option, even better would be to not use text to store binary data. It can be done, but not with any standard character encoding. i.e. you have to do the encoding yourself.
To answer your question literally, you can use your own character encoding. This is a very bad idea as any text is likely to get encoded and mangled in the same way as you have seen. Using Base64 avoids this by using characters which are safe in any encoding.
byte[] bytes = new byte[256]; for (int i = 0; i < bytes.length; i++) bytes[i] = (byte) i; String text = new String(bytes, 0); byte[] bytes2 = new byte[text.length()]; for (int i = 0; i < bytes2.length; i++) bytes2[i] = (byte) text.charAt(i); int count = 0; for (int i = 0; i < bytes2.length; i++) if (bytes2[i] != (byte) i) System.out.println(i); else count++; System.out.println(count + " bytes matched.");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With