When experimenting with ZLib compression, I have run across a strange problem. Decompressing a zlib-compressed byte array with random data fails reproducibly if the source array is at least 32752 bytes long. Here's a little program that reproduces the problem, you can see it in action on IDEOne. The compression and decompression methods are standard code picked off tutorials.
public class ZlibMain {
private static byte[] compress(final byte[] data) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
final byte[] returnValues = new byte[numberOfBytesAfterCompression];
System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
return returnValues;
}
private static byte[] decompress(final byte[] data) {
final Inflater inflater = new Inflater();
inflater.setInput(data);
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
while (!inflater.finished()) {
final int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
final byte[] output = outputStream.toByteArray();
return output;
} catch (DataFormatException | IOException e) {
throw new RuntimeException(e);
}
}
public static void main(final String[] args) {
roundTrip(100);
roundTrip(1000);
roundTrip(10000);
roundTrip(20000);
roundTrip(30000);
roundTrip(32000);
for (int i = 32700; i < 33000; i++) {
if(!roundTrip(i))break;
}
}
private static boolean roundTrip(final int i) {
System.out.printf("Starting round trip with size %d: ", i);
final byte[] data = new byte[i];
for (int j = 0; j < data.length; j++) {
data[j]= (byte) j;
}
shuffleArray(data);
final byte[] compressed = compress(data);
try {
final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
.get(2, TimeUnit.SECONDS);
System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
return true;
} catch (InterruptedException | ExecutionException | TimeoutException e) {
System.out.println("Failure!");
return false;
}
}
// Implementing Fisher–Yates shuffle
// source: https://stackoverflow.com/a/1520212/342852
static void shuffleArray(byte[] ar) {
Random rnd = ThreadLocalRandom.current();
for (int i = ar.length - 1; i > 0; i--) {
int index = rnd.nextInt(i + 1);
// Simple swap
byte a = ar[index];
ar[index] = ar[i];
ar[i] = a;
}
}
}
Is this a known bug in ZLib? Or do I have an error in my compress / decompress routines?
Compressing a byte array is a matter of recognizing repeating patterns within a byte sequence and devising a method that can represent the same underlying information to take advantage of these discovered repetitions.
The compression ratio of zlib ranges between 2:1 to 5:1. Additionally, it provides ten compression levels, and different levels have different compression ratios and speeds. zlib algorithm uses Deflate method for compression and Inflate method for decompression. Deflate method encodes the data into compressed data.
zlib compressed data are typically written with a gzip or a zlib wrapper. The wrapper encapsulates the raw DEFLATE data by adding a header and trailer. This provides stream identification and error detection that are not provided by the raw DEFLATE data.
It is an error in the logic of the compress / decompress methods; I am not this deep in the implementations but with debugging I found the following:
When the buffer of 32752 bytes is compressed, the deflater.deflate()
method returns a value of 32767, this is the size to which you initialized the buffer in the line:
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
If you increase the buffer size for example to
final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];
the you will see, that the input of 32752 bytes actually is deflated to 32768 bytes. So in your code, the compressed data does not contain all the data which should be in there.
When you then try to decompress, the inflater.inflate()
method returns zero which indicates that more input data is needed. But as you only check for inflater.finished()
you end in an endless loop.
So you can either increase the buffer size on compressing, but that probably just means haveing the problem with bigger files, or you better need to rewrite to compress/decompress logic to process your data in chunks.
Apparently the compress() method was faulty. This one works:
public static byte[] compress(final byte[] data) {
try (final ByteArrayOutputStream outputStream =
new ByteArrayOutputStream(data.length);) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] buffer = new byte[1024];
while (!deflater.finished()) {
final int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
final byte[] output = outputStream.toByteArray();
return output;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With