I have a Text with Latin, Cyrillic and Chinese Characters containing.
I try to compress a String (over bytes[]
) with GZIPOutputStream
and decompress it with GZIPInputStream. But I do not manage to convert all Characters back to the original Characters. Some appear as ?
.
I thought that UTF-16 will do the job.
Any help?
Regards
Here's my code:
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Inflater;
import java.util.zip.ZipException;
public class CompressUncompressStrings {
public static void main(String[] args) throws UnsupportedEncodingException {
String sTestString="äöüäöü 长安";
System.out.println(sTestString);
byte bcompressed[]=compress(sTestString.getBytes("UTF-16"));
//byte bcompressed[]=compress(sTestString.getBytes());
String sDecompressed=decompress(bcompressed);
System.out.println(sDecompressed);
}
public static byte[] compress(byte[] content){
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try{
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(content);
gzipOutputStream.close();
} catch(IOException e){
throw new RuntimeException(e);
}
return byteArrayOutputStream.toByteArray();
}
public static String decompress(byte[] contentBytes){
String sReturn="";
ByteArrayOutputStream out = new ByteArrayOutputStream();
try{
GZIPInputStream gzipInputStream =new GZIPInputStream(new ByteArrayInputStream(contentBytes));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int value = 0; value != -1;) {
value = gzipInputStream.read();
if (value != -1) {
baos.write(value);
}
}
gzipInputStream.close();
baos.close();
sReturn=new String(baos.toByteArray(), "UTF-16");
return sReturn;
// Ende Neu
} catch(IOException e){
throw new RuntimeException(e);
}
}
}
I suspect it's just the console that's having a problem. I tried the above code, and although it didn't print out any of the characters properly, when I tested the round-tripping of the string, it was fine:
System.out.println(sDecompressed.equals(sTestString)); // Prints true
What does that do on your machine?
Displaying an non ASCII character on a console output is not easy. Assuming you're using Windows as your operating system (since the command line doesn't support Unicode by default), you can change your active code page number (using the chcp
command). I don't know how it's done through code but I suggest running the code on command line.
This chcp value 65001
changes to tell windows to use UTF-8 on it's console (you can view a discussion here).
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With