Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In PHP, how to decompress a file on the fly that was compressed twice?

Tags:

php

gzip

zlib

I have a bigfile.gz.gz file that is… big. I would like to uncompress it on the fly. Ideally, this is what I have in mind:

$in = fopen('compress.zlib://compress.zlib://bigfile.gz.gz', 'rb');
while (!feof($in))
    print fread($in, 4096);
fclose($in);

However, compress.zlib:// cannot be chained that way:

PHP Warning:  fopen(): cannot represent a stream of type ZLIB as a File Descriptor in gztest.php on line 1

 

So I thought I’d combine gzopen() and compress.zlib:// together:

$in = gzopen('compress.zlib://bigfile.gz.gz', 'rb');
while (!gzeof($in))
    print gzread($in, 4096);
gzclose($in);

However, this only decompresses one level of gzip.

 

I tried probably 10 other methods, unfortunately gzopen() does not work with php://memory if it’s been written to using fwrite(). And stream_filter_append(… zlib.inflate …) cannot read gzipped files.

This is the best I could come up with, but it spawns two system processes, which has undesirable overhead:

$in = popen('zcat bigfile.gz.gz | gunzip', 'rb');
while (!feof($in))
    print fread($in, 4096);
fclose($in);

 

Can someone suggest something better maybe?

like image 382
sam hocevar Avatar asked Jul 29 '14 16:07

sam hocevar


1 Answers

It's possible to uncompress .gz files using the zlib.inflate filter. You just need to strip out the gzip header first. To do that on the fly, you have to deploy a custom filter:

<?php

class gzip_header_filter extends php_user_filter {

    private $filtered = 0;

    public function filter($in, $out, &$consumed, $closing) {
        while ($bucket = stream_bucket_make_writeable($in)) {
            if($this->filtered == 0) {
                $header_len = 10;
                $header = substr($bucket->data, 0, 10);
                $flags = ord($header[3]);
                if($flags & 0x08) {
                    // a filename is present
                    $header_len = strpos($bucket->data, "\0", 10) + 1;
                } 
                $bucket->data = substr($bucket->data, $header_len);
                $this->filtered = $header_len;
            }
            $consumed += $bucket->datalen;
            stream_bucket_append($out, $bucket);
        }
        return PSFS_PASS_ON;
    }
}

stream_filter_register('gzip_header_filter', 'gzip_header_filter');

$in = fopen('bigfile.gz.gz', 'rb');
stream_filter_append($in, 'gzip_header_filter', STREAM_FILTER_READ);
stream_filter_append($in, 'zlib.inflate', STREAM_FILTER_READ);
stream_filter_append($in, 'gzip_header_filter', STREAM_FILTER_READ);
stream_filter_append($in, 'zlib.inflate', STREAM_FILTER_READ);

while (!feof($in))
    print fread($in, 4096);
fclose($in);

?>

Note that the code above doesn't handle comments and other extra data that could be stored in the gz file.

like image 190
cleong Avatar answered Oct 14 '22 11:10

cleong