Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting byte array to String (Java)

I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore.

I'm using fetchData to return a byte[] of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!

Here's my only problem now:

The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.

The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?

like image 289
Josh Avatar asked Apr 15 '11 06:04

Josh


People also ask

Can we convert byte array to String in Java?

Convert byte[] to String (text data) toString() to get the string from the bytes; The bytes. toString() only returns the address of the object in memory, NOT converting byte[] to a string ! The correct way to convert byte[] to string is new String(bytes, StandardCharsets.

How do you convert a byte to a String in Java?

So below code can also be used to convert byte array to String in Java. String str = new String(byteArray, StandardCharsets. UTF_8); String class also has a method to convert a subset of the byte array to String.

Can we convert byte to String in Java?

Given a Byte value in Java, the task is to convert this byte value to string type. One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable.


1 Answers

The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:

String decoded = new String(bytes, "UTF-8");  // example for one encoding type 

By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.


-109 = 0x93: Control Code "Set Transmit State" 

The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.

0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:

System.out.println(new String(new byte[]{-109}, "Cp1252"));  
like image 98
Andreas Dolk Avatar answered Sep 28 '22 08:09

Andreas Dolk