Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deflater.deflate and small output buffers

Tags:

java

deflate

I'm seeing a strange situation with small output buffers with Java 8u45 and the java.util.Deflater.deflate(byte[] b, int off, int len, int flush) method when used with small output buffers.

(I'm working on some low level networking code related to WebSocket's upcoming permessage-deflate extension, so small buffers are a reality for me)

The example code:

package deflate;

import java.nio.charset.StandardCharsets;
import java.util.zip.Deflater;

public class DeflaterSmallBufferBug
{
    public static void main(String[] args)
    {
        boolean nowrap = true;
        Deflater deflater = new Deflater(Deflater.DEFAULT_COMPRESSION,nowrap);

        byte[] input = "Hello".getBytes(StandardCharsets.UTF_8);

        System.out.printf("input is %,d bytes - %s%n",input.length,getHex(input,0,input.length));

        deflater.setInput(input);

        byte[] output = new byte[input.length];

        // break out of infinite loop seen with bug
        int maxloops = 10;

        // Compress the data
        while (maxloops-- > 0)
        {
            int compressed = deflater.deflate(output,0,output.length,Deflater.SYNC_FLUSH);
            System.out.printf("compressed %,d bytes - %s%n",compressed,getHex(output,0,compressed));

            if (compressed < output.length)
            {
                System.out.printf("Compress success");
                return;
            }
        }

        System.out.printf("Exited compress (maxloops left %d)%n",maxloops);
    }

    private static String getHex(byte[] buf, int offset, int len)
    {
        StringBuilder hex = new StringBuilder();
        hex.append('[');
        for (int i = offset; i < (offset + len); i++)
        {
            if (i > offset)
            {
                hex.append(' ');
            }
            hex.append(String.format("%02X",buf[i]));
        }
        hex.append(']');
        return hex.toString();
    }
}

In the above case, I'm attempting to generate compressed bytes for the input "Hello" using an output buffer of 5 bytes in length.

I would assume the following resulting bytes:

buffer 1 [ F2 48 CD C9 C9 ]
buffer 2 [ 07 00 00 00 FF ]
buffer 3 [ FF ]

Which translates as

[ F2 48 CD C9 C9 07 00 ] <-- the compressed data
[ 00 00 FF FF ]          <-- the deflate tail bytes

However, when Deflater.deflate() is used with a small buffer, this normal loop continues infinitely at 5 bytes of compressed data (seems to only manifest at buffers of 5 bytes or lower).

Resulting output of running the above demo ...

input is 5 bytes - [48 65 6C 6C 6F]
compressed 5 bytes - [F2 48 CD C9 C9]
compressed 5 bytes - [07 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
Exited compress (maxloops left -1)

If you make the input/output larger than 5 bytes then the problem seems to go away. (Just make the input string "Hellox" to test this for yourself)

Results of making the buffer 6 bytes (input as "Hellox")

input is 6 bytes - [48 65 6C 6C 6F 78]
compressed 6 bytes - [F2 48 CD C9 C9 AF]
compressed 6 bytes - [00 00 00 00 FF FF]
compressed 5 bytes - [00 00 00 FF FF]
Compress success

Even these results are bit quirky to me, as it seems there's 2 deflate tail-byte sequences present.

So, I guess my ultimate question is, am I missing something about the Deflater usage that making thing odd for me, or is this pointing at a possible bug in the JVM Deflater implementation itself?

Update: Aug 7, 2015

This discovery has been accepted as bugs.java.com/JDK-8133170

like image 869
Joakim Erdfelt Avatar asked Aug 06 '15 17:08

Joakim Erdfelt


1 Answers

This is a zlib "feature", documented in zlib.h:

In the case of a Z_FULL_FLUSH or Z_SYNC_FLUSH, make sure that avail_out is greater than six to avoid repeated flush markers due to avail_out == 0 on return.

What is happening is that each call of deflate() with Z_SYNC_FLUSH is inserting a five-byte flush marker. Since you are not providing enough output space to get the marker, you call again to get more output, but are asking it to insert another flush marker at the same time.

What you should be doing is calling deflate() with Z_SYNC_FLUSH once, and then getting all of the available output with additional deflate() calls, if necessary, that use Z_NO_FLUSH (or NO_FLUSH in Java).

like image 179
Mark Adler Avatar answered Sep 22 '22 18:09

Mark Adler