Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mimic a "use_crc" flag on (hacked) 1998 uncompress() in 2013 zlib API interface?

I was updating a project's code from a 1998 version of zlib to a 2013 version of zlib. One thing that seemed to change is that there used to be a "use_crc" flag on the uncompress function, which appeared to have gone missing:

int ZEXPORT uncompress (dest, destLen, source, sourceLen, use_crc)
    Bytef *dest;
    uLongf *destLen;
    const Bytef *source;
    uLong sourceLen;
    int use_crc; // <-- vanished (?)

(UPDATE: as pointed out by @Joe, this is likely a third-party modification. Title updated accordingly. The rest of the question is still applicable, as in, "how should I best do this with today's stock zlib".)

In the code I'm studying, uncompress() is being called by something that deconstructs the binary format of a .zip and passes in a "payload" of data. The code had been passing the crc flag in as 1. If the flag was not used, it would get a Z_DATA_ERROR (-3). (A zlib with no use_crc flag gets Z_DATA_ERROR just as if the flag had been false.)

In experiments, I found that very small files worked without use_crc. Then that the small counting files crossed over to not-working between "12345678901234" and "123456789012345". Reason was: that's the first file which was deflated instead of stored uncompressed (at what zip called a savings of "6%")

In floundering with options to get zlib to accept it, I tried many things. That included trying the 16 + MAX_WBITS. Nothing seemed to process the payload out of zip test.zip test.txt the way the old code had.

If I was willing to subtract one out of my destination size, I seemed to be able to suppress the erring check...at the loss of one byte. Here's the simple test program with the minimal zip payload hardcoded:

#include <stdio.h>
#include "zlib.h"

int main(int argc, char *argv[]) {
    char compressed[] = { 0x78, 0x9C, 0x33, 0x34, 0x32, 0x36, 0x31, 0x35, 0x33,
        0xB7, 0xB0, 0x34, 0x30, 0x04, 0xB1, 0xB8, 0x00, 0x31, 0x30, 0xB1, 0x30,
        0x10, 0x00, 0x00, 0x00 }; // last 4 bytes are size (16)

    char uncompressed[16 + 1]; // account for null terminator
    int ret; z_stream strm;

    memset(uncompressed, 'X', 16);
    uncompressed[16] = '\0';

    strm.zalloc = strm.zfree = strm.opaque = Z_NULL;
    strm.total_out = 0;
    strm.avail_in = 25;
    strm.next_in = compressed;

    ret = inflateInit2(&strm, MAX_WBITS /* + 16 */); // it is Z_OK

    strm.avail_out = 15; // 16 gives error -3: "incorrect header check" 
    strm.next_out = uncompressed;
    ret = inflate(&strm, Z_NO_FLUSH);

    if (ret != /* Z_STREAM_END */ Z_OK) { // doesn't finish... 
        printf("inflate() error %d: %s\n", ret, strm.msg);
        return 2;
    }

    inflateEnd(&strm);
    printf("successful inflation: %s\n", uncompressed);
    return 0;
}

The output is:

successful inflation: 123456789012345X

Showing the data is getting uncompressed, but we need all 16 bytes. (There's a newline in there from the file that should be received.) 16 + MAX_WBITS can't even get that.

Any ideas what's going wrong? No permutation of settings seems to get there without errors.

like image 999
HostileFork says dont trust SE Avatar asked Oct 02 '15 12:10

HostileFork says dont trust SE


1 Answers

No, there have been no incompatible changes to the zlib interface since it was introduced over 20 years ago. There was never a use_crc argument to uncompress().

The example you give is a two-byte zlib header, deflate-compressed data, the CRC-32 of the deflate data in big-endian order, followed by a four-byte length in little-endian order. This is a truly odd mash up of the zlib and gzip wrappers , and has nothing whatsoever to do with the zip format, which you keep mentioning. (What do you mean "payloads inside of zip files"?) zlib has an Adler-32 at the end in big-endian order whereas gzip has a CRC-32 in little-endian order followed by a four-byte length in little-endian order. This one mixes those up, including the byte ordering, and then deliberately misleadingly puts a valid zlib header on the thing, which is an affront to all that is good and decent in this world.

I'm pretty sure that whoever came up with this format was drunk at the time.

In order to decode this you will need to:

  1. Discard the first two bytes of the stream. (You can check that it is a valid zlib header, but that turns out to be meaningless in interpreting the rest of the stream.)

  2. Use raw deflate, initializing with inflateInit2(&strm, -15), to decompress the data. As you decompress, keep track of the total length and compute the CRC-32 using crc32().

  3. After the deflate data completes, read the next four bytes, assemble them in big-endian order to a 32-bit value, and compare that to the CRC-32 you computed. If it does not match, the stream is corrupted, or it is not one of these oddly formatted streams. (Maybe try again, decoding it as a normal zlib stream. If it had a good zlib header, then maybe that's what it actually is, as opposed to one of these Frankenstein streams.)

  4. Read the next four bytes and assemble those in little-endian order, and compare that to length of the uncompressed data. If it does not match, then the stream is corrupted, or it's not what you think.

  5. If the data does not end here, then something else odd is going on. Consult the drunk person.

like image 70
Mark Adler Avatar answered Sep 27 '22 23:09

Mark Adler