GZIPInputStream and Characterset

Question

I have a Text with Latin, Cyrillic and Chinese Characters containing. I try to compress a String (over bytes[]) with GZIPOutputStream and decompress it with GZIPInputStream. But I do not manage to convert all Characters back to the original Characters. Some appear as ?.

I thought that UTF-16 will do the job.

Any help?

Regards

Here's my code:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Inflater;
import java.util.zip.ZipException;

public class CompressUncompressStrings {

    public static void main(String[] args) throws UnsupportedEncodingException {

        String sTestString="äöüäöü 长安";
        System.out.println(sTestString);
        byte bcompressed[]=compress(sTestString.getBytes("UTF-16"));
        //byte bcompressed[]=compress(sTestString.getBytes());
        String sDecompressed=decompress(bcompressed);
        System.out.println(sDecompressed);
    }
    public static byte[] compress(byte[] content){
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        try{
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
            gzipOutputStream.write(content);
            gzipOutputStream.close();
        } catch(IOException e){
            throw new RuntimeException(e);
        }
        return byteArrayOutputStream.toByteArray();
    }
    public static String decompress(byte[] contentBytes){

        String sReturn="";
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        try{
            GZIPInputStream gzipInputStream =new GZIPInputStream(new ByteArrayInputStream(contentBytes));
             ByteArrayOutputStream baos = new ByteArrayOutputStream();
             for (int value = 0; value != -1;) {
                 value = gzipInputStream.read();
                 if (value != -1) {
                     baos.write(value);
                 }
             }
             gzipInputStream.close();
             baos.close();
             sReturn=new String(baos.toByteArray(), "UTF-16");
             return sReturn;
                 // Ende Neu

        } catch(IOException e){
            throw new RuntimeException(e);
        }
    }
}

Jon Skeet · Accepted Answer

I suspect it's just the console that's having a problem. I tried the above code, and although it didn't print out any of the characters properly, when I tested the round-tripping of the string, it was fine:

System.out.println(sDecompressed.equals(sTestString)); // Prints true

What does that do on your machine?

System.out.println(sDecompressed.equals(sTestString)); // Prints true

What does that do on your machine?

Buhake Sindi · Answer

Displaying an non ASCII character on a console output is not easy. Assuming you're using Windows as your operating system (since the command line doesn't support Unicode by default), you can change your active code page number (using the chcp command). I don't know how it's done through code but I suggest running the code on command line.

This chcp value 65001 changes to tell windows to use UTF-8 on it's console (you can view a discussion here).

I hope this helps.

This chcp value 65001 changes to tell windows to use UTF-8 on it's console (you can view a discussion here).

I hope this helps.

GZIPInputStream and Characterset

Tags:

java

compression

gzipinputstream

mcflysoft

2 Answers

Jon Skeet

Buhake Sindi

Recent Activity

Donate For Us

GZIPInputStream and Characterset

Tags:

java

compression

gzipinputstream

mcflysoft

2 Answers

Jon Skeet

Buhake Sindi

Related questions

Recent Activity

Donate For Us