I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html
file in the blobstore.
I'm using fetchData to return a byte[]
of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!
Here's my only problem now:
The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.
The smart quotes are coming back as -108
and -109
in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?
Convert byte[] to String (text data) toString() to get the string from the bytes; The bytes. toString() only returns the address of the object in memory, NOT converting byte[] to a string ! The correct way to convert byte[] to string is new String(bytes, StandardCharsets.
So below code can also be used to convert byte array to String in Java. String str = new String(byteArray, StandardCharsets. UTF_8); String class also has a method to convert a subset of the byte array to String.
Given a Byte value in Java, the task is to convert this byte value to string type. One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable.
The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:
String decoded = new String(bytes, "UTF-8"); // example for one encoding type
By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte
is signed, it covers the range from -128 to 127.
-109 = 0x93: Control Code "Set Transmit State"
The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.
0x93
in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:
System.out.println(new String(new byte[]{-109}, "Cp1252"));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With