Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve GZIP performance

Currently I do have the problem that this piece of code will be called >500k of times. The size of the compressed byte[] is less than 1KB. Every time the method is called all of the streams has to been created. So I am looking for a way to improve this code.

private byte[] unzip(byte[] data) throws IOException, DataFormatException {

    byte[] unzipData = new byte[4096];

    try (ByteArrayInputStream in = new ByteArrayInputStream(data);
         GZIPInputStream gzipIn = new GZIPInputStream(in);
         ByteArrayOutputStream out = new ByteArrayOutputStream()) {

        int read = 0;
        while( (read = gzipIn.read(unzipData)) != -1) {
            out.write(unzipData, 0, read);
        }

        return out.toByteArray();
    }
}

I already tried to replace ByteArrayOutputStream with a ByteBuffer, but at the time of creation I don't know how many bytes I need to allocate.

Also, I tried to use Inflater but I stumbled across the problem descriped here.

Any other idea what I could do to improve the perfomance of this code.

UPDATE#1

  • Maybe this lib helps someone.
  • Also there is an open JDK-Bug.
like image 728
Christian Avatar asked Sep 13 '15 13:09

Christian


People also ask

Does gzip improve performance?

Gzip is a fast and easy way to improve page speed performance while still delivering a high-quality experience to your users. See if your website supports gzip by running a free speed test, and sign up for a free trial for more insights into your website's performance.

Why is gzip so slow?

The reason is that gzip operates on (in terms of CPU speed vs HD seek speed these days) extremely low buffer sizes. It reads a few KB from from the input file, compresses it, and flushes it to the output file. Given the fact that this requires a hard drive seek, only a few operations can be done per seconds.

Which is faster gzip or ZIP?

First, tar + gzip compresses better than zip, since the compression of the next file can use history from the previous file (sometimes referred to as a "solid" archive). zip can only compress files individually.

How efficient is gzip compression?

However, in practice, GZIP performs best on text-based content, often achieving compression rates of as high as 70-90% for larger files, whereas running GZIP on assets that are already compressed via alternative algorithms (for example, most image formats) yields little to no improvement.


2 Answers

  1. Profile your application, to be sure that you're really spending optimizable time in this function. It doesn't matter how many times you call this function; if it doesn't account for a significant fraction of overall program execution time, then optimization is wasted.

  2. Pre-size the ByteArrayOutputStream. The default buffer size is 32 bytes, and resizes require copying all existing bytes. If you know that your decoded arrays will be around 1k, use new ByteArrayOutputStream(2048).

  3. Rather than reading a byte at a time, read a block at a time, using a pre-allocated byte[]. Beware that you must use the return value from read as an input to write. Better, use something like Jakarta Commons IOUtils.copy() to avoid mistakes.

like image 55
anon Avatar answered Oct 15 '22 09:10

anon


I'm not sure if it applies in your case, but I've found incredible speed difference when comparing using the default buffer size of GZIPInputStream vs increasing to 65536.

example: using a 500M input file ->

new GZIPInputStream(new FileInputStream(path.toFile())) // takes 4 mins to process

vs

new GZIPInputStream(new FileInputStream(path.toFile()), 65536) // takes 10s

J

More details can be found here http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/

Both BufferedInputStream and GZIPInputStream have internal buffers. Default size for the former one is 8192 bytes and for the latter one is 512 bytes. Generally it worth increasing any of these sizes to at least 65536.

like image 20
prule Avatar answered Oct 15 '22 10:10

prule