Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a bug in this gzip inflate method?

When searching on how to inflate gzip compressed data on iOS, the following method appears in number of results:

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    BOOL done = NO;
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;
    while (!done)
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);
        if (status == Z_STREAM_END) done = YES;
        else if (status != Z_OK) break;
    }
    if (inflateEnd (&strm) != Z_OK) return nil;

    // Set real length.
    if (done)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

But I've come across some examples of data (deflated on a Linux machine with Python's gzip module) that this method running on iOS is failing to inflate. Here's what's happening:

In the last iteration of the while loop inflate() returns Z_BUF_ERROR and the loop is exited. But inflateEnd(), which is called after the loop, returns Z_OK. The code then assumes that since inflate() never returned Z_STREAM_END, the inflation failed and returns null.

According to this page, http://www.zlib.net/zlib_faq.html#faq05 Z_BUF_ERROR is not a fatal error, and my tests with limited examples show that the data is successfully inflated if the inflateEnd() returns Z_OK, even though the last call of inflate() did not return Z_OK. It seems like the inflateEnd() finished up inflating the last chunk of data.

I don't know much about compression and how gzip works, so I'm hesitant to make changes to this code without fully understanding what it does. I'm hoping someone with more knowledge about the topic can shed some light on this potential logic flaw in the code above, and suggest a way to fix it.

Another method that Google turns up, that seems to suffer from the same problem can be found here: https://github.com/nicklockwood/GZIP/blob/master/GZIP/NSData%2BGZIP.m

Edit:

So, it is a bug! Now, how to we fix it? Below is my attempt. Code review, anyone?

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;

    do
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);

        switch (status) {
            case Z_NEED_DICT:
                status = Z_DATA_ERROR;     /* and fall through */
            case Z_DATA_ERROR:
            case Z_MEM_ERROR:
            case Z_STREAM_ERROR:
                (void)inflateEnd(&strm);
                return nil;
        }
    } while (status != Z_STREAM_END);

    (void)inflateEnd (&strm);

    // Set real length.
    if (status == Z_STREAM_END)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

Edit 2:

Here's a sample Xcode project that illustrates the issue I'm running in. The deflate happens on the server side and the data is base64 and url encoded before being transported via HTTP. I've embedded the url encoded base64 string in the ViewController.m. The url-decode and base64-decode as well as your gzipInflate methods are in NSDataExtension.m

https://dl.dropboxusercontent.com/u/38893107/gzip/GZIPTEST.zip

Here's the binary file as deflated by python gzip library:

https://dl.dropboxusercontent.com/u/38893107/gzip/binary.zip

This is the URL encoded base64 string that gets transported over the HTTP: https://dl.dropboxusercontent.com/u/38893107/gzip/urlEncodedBase64.txt

like image 231
subjective-c Avatar asked Jul 23 '13 20:07

subjective-c


2 Answers

Yes, it's a bug.

It is in fact correct that if inflate() does not return Z_STREAM_END, then you have not completed inflation. inflateEnd() returning Z_OK doesn't really mean much -- just that it was given a valid state and was able to free the memory.

So inflate() must eventually return Z_STREAM_END before you can declare success. However Z_BUF_ERROR is not a reason to give up. In that case you simply call inflate() again with more input or more output space. Then you will get the Z_STREAM_END.

From the documentation in zlib.h:

/* ...
Z_BUF_ERROR if no progress is possible or if there was not enough room in the
output buffer when Z_FINISH is used.  Note that Z_BUF_ERROR is not fatal, and
inflate() can be called again with more input and more output space to
continue decompressing.
... */

Update:

Since there is buggy code floating around out there, below is the proper code to implement the desired method. This code handles incomplete gzip streams, concatenated gzip streams, and very large gzip streams. For very large gzip streams, the unsigned lengths in the z_stream are not large enough when compiled as a 64-bit executable. NSUInteger is 64 bits, whereas unsigned is 32 bits. In that case, you have to loop on the input to feed it to inflate().

This example simply returns nil on any error. The nature of the error is noted in a comment after each return nil;, in case more sophisticated error handling is desired.

- (NSData *) gzipInflate
{
    z_stream strm;

    // Initialize input
    strm.next_in = (Bytef *)[self bytes];
    NSUInteger left = [self length];        // input left to decompress
    if (left == 0)
        return nil;                         // incomplete gzip stream

    // Create starting space for output (guess double the input size, will grow
    // if needed -- in an extreme case, could end up needing more than 1000
    // times the input size)
    NSUInteger space = left << 1;
    if (space < left)
        space = NSUIntegerMax;
    NSMutableData *decompressed = [NSMutableData dataWithLength: space];
    space = [decompressed length];

    // Initialize output
    strm.next_out = (Bytef *)[decompressed mutableBytes];
    NSUInteger have = 0;                    // output generated so far

    // Set up for gzip decoding
    strm.avail_in = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    int status = inflateInit2(&strm, (15+16));
    if (status != Z_OK)
        return nil;                         // out of memory

    // Decompress all of self
    do {
        // Allow for concatenated gzip streams (per RFC 1952)
        if (status == Z_STREAM_END)
            (void)inflateReset(&strm);

        // Provide input for inflate
        if (strm.avail_in == 0) {
            strm.avail_in = left > UINT_MAX ? UINT_MAX : (unsigned)left;
            left -= strm.avail_in;
        }

        // Decompress the available input
        do {
            // Allocate more output space if none left
            if (space == have) {
                // Double space, handle overflow
                space <<= 1;
                if (space < have) {
                    space = NSUIntegerMax;
                    if (space == have) {
                        // space was already maxed out!
                        (void)inflateEnd(&strm);
                        return nil;         // output exceeds integer size
                    }
                }

                // Increase space
                [decompressed setLength: space];
                space = [decompressed length];

                // Update output pointer (might have moved)
                strm.next_out = (Bytef *)[decompressed mutableBytes] + have;
            }

            // Provide output space for inflate
            strm.avail_out = space - have > UINT_MAX ? UINT_MAX :
                             (unsigned)(space - have);
            have += strm.avail_out;

            // Inflate and update the decompressed size
            status = inflate (&strm, Z_SYNC_FLUSH);
            have -= strm.avail_out;

            // Bail out if any errors
            if (status != Z_OK && status != Z_BUF_ERROR &&
                status != Z_STREAM_END) {
                (void)inflateEnd(&strm);
                return nil;                 // invalid gzip stream
            }

            // Repeat until all output is generated from provided input (note
            // that even if strm.avail_in is zero, there may still be pending
            // output -- we're not done until the output buffer isn't filled)
        } while (strm.avail_out == 0);

        // Continue until all input consumed
    } while (left || strm.avail_in);

    // Free the memory allocated by inflateInit2()
    (void)inflateEnd(&strm);

    // Verify that the input is a valid gzip stream
    if (status != Z_STREAM_END)
        return nil;                         // incomplete gzip stream

    // Set the actual length and return the decompressed data
    [decompressed setLength: have];
    return decompressed;
}
like image 114
Mark Adler Avatar answered Oct 17 '22 04:10

Mark Adler


Yes, looks like a bug. According to this annotated example from the zlib site, Z_BUF_ERROR is just an indication that there is no more output unless inflate() is provided with more input, not in itself a reason to abort the inflate loop abnormally.

In fact, the linked sample seems to handle Z_BUF_ERROR exactly like Z_OK.

like image 34
Joachim Isaksson Avatar answered Oct 17 '22 06:10

Joachim Isaksson