Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ZLib decompression fails on large byte array

Tags:

java

arrays

zlib

When experimenting with ZLib compression, I have run across a strange problem. Decompressing a zlib-compressed byte array with random data fails reproducibly if the source array is at least 32752 bytes long. Here's a little program that reproduces the problem, you can see it in action on IDEOne. The compression and decompression methods are standard code picked off tutorials.

public class ZlibMain {

    private static byte[] compress(final byte[] data) {
        final Deflater deflater = new Deflater();
        deflater.setInput(data);

        deflater.finish();
        final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
        final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
        final byte[] returnValues = new byte[numberOfBytesAfterCompression];
        System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
        return returnValues;

    }

    private static byte[] decompress(final byte[] data) {
        final Inflater inflater = new Inflater();
        inflater.setInput(data);
        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
            final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
            while (!inflater.finished()) {
                final int count = inflater.inflate(buffer);
                outputStream.write(buffer, 0, count);
            }
            outputStream.close();
            final byte[] output = outputStream.toByteArray();
            return output;
        } catch (DataFormatException | IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(final String[] args) {
        roundTrip(100);
        roundTrip(1000);
        roundTrip(10000);
        roundTrip(20000);
        roundTrip(30000);
        roundTrip(32000);
        for (int i = 32700; i < 33000; i++) {
            if(!roundTrip(i))break;
        }
    }

    private static boolean roundTrip(final int i) {
        System.out.printf("Starting round trip with size %d: ", i);
        final byte[] data = new byte[i];
        for (int j = 0; j < data.length; j++) {
            data[j]= (byte) j;
        }
        shuffleArray(data);

        final byte[] compressed = compress(data);
        try {
            final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
                                                         .get(2, TimeUnit.SECONDS);
            System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
            return true;
        } catch (InterruptedException | ExecutionException | TimeoutException e) {
            System.out.println("Failure!");
            return false;
        }
    }

    // Implementing Fisher–Yates shuffle
    // source: https://stackoverflow.com/a/1520212/342852
    static void shuffleArray(byte[] ar) {
        Random rnd = ThreadLocalRandom.current();
        for (int i = ar.length - 1; i > 0; i--) {
            int index = rnd.nextInt(i + 1);
            // Simple swap
            byte a = ar[index];
            ar[index] = ar[i];
            ar[i] = a;
        }
    }
}

Is this a known bug in ZLib? Or do I have an error in my compress / decompress routines?

like image 909
Sean Patrick Floyd Avatar asked May 31 '17 11:05

Sean Patrick Floyd


People also ask

Can you compress byte array?

Compressing a byte array is a matter of recognizing repeating patterns within a byte sequence and devising a method that can represent the same underlying information to take advantage of these discovered repetitions.

What is zlib compression level?

The compression ratio of zlib ranges between 2:1 to 5:1. Additionally, it provides ten compression levels, and different levels have different compression ratios and speeds. zlib algorithm uses Deflate method for compression and Inflate method for decompression. Deflate method encodes the data into compressed data.

What does zlib compression do?

zlib compressed data are typically written with a gzip or a zlib wrapper. The wrapper encapsulates the raw DEFLATE data by adding a header and trailer. This provides stream identification and error detection that are not provided by the raw DEFLATE data.


2 Answers

It is an error in the logic of the compress / decompress methods; I am not this deep in the implementations but with debugging I found the following:

When the buffer of 32752 bytes is compressed, the deflater.deflate() method returns a value of 32767, this is the size to which you initialized the buffer in the line:

final byte[] bytesCompressed = new byte[Short.MAX_VALUE];

If you increase the buffer size for example to

final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];

the you will see, that the input of 32752 bytes actually is deflated to 32768 bytes. So in your code, the compressed data does not contain all the data which should be in there.

When you then try to decompress, the inflater.inflate()method returns zero which indicates that more input data is needed. But as you only check for inflater.finished() you end in an endless loop.

So you can either increase the buffer size on compressing, but that probably just means haveing the problem with bigger files, or you better need to rewrite to compress/decompress logic to process your data in chunks.

like image 173
P.J.Meisch Avatar answered Nov 07 '22 12:11

P.J.Meisch


Apparently the compress() method was faulty. This one works:

public static byte[] compress(final byte[] data) {
    try (final ByteArrayOutputStream outputStream = 
                                     new ByteArrayOutputStream(data.length);) {

        final Deflater deflater = new Deflater();
        deflater.setInput(data);
        deflater.finish();
        final byte[] buffer = new byte[1024];
        while (!deflater.finished()) {
            final int count = deflater.deflate(buffer);
            outputStream.write(buffer, 0, count);
        }

        final byte[] output = outputStream.toByteArray();
        return output;
    } catch (IOException e) {
        throw new IllegalStateException(e);
    }
}
like image 25
Sean Patrick Floyd Avatar answered Nov 07 '22 13:11

Sean Patrick Floyd