Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does gzcompress work?

I'm wondering about why I need to cut off the last 4 Characters, after using gzcompress().

Here is my code:

header("Content-Encoding: gzip");
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$index = $smarty->fetch("design/templates/main.htm") ."\n<!-- Compressed by gzip -->";
$this->content_size = strlen($index);
$this->content_crc = crc32($index);
$index = gzcompress($index, 9);
$index = substr($index, 0, strlen($index) - 4); // Why cut off ??
echo $index;
echo pack('V', $this->content_crc) . pack('V', $this->content_size);

When I don't cut of the last 4 chars, the source ends like:

[...]
<!-- Compressed by gzip -->N

When I cut them off it reads:

[...]
<!-- Compressed by gzip -->

I could see the additional N only in Chromes Code inspector (not in Firefox and not in IEs source). But there seams to be four additional characters at the end of the code.

Can anyone explain me, why I need to cut off 4 chars?

like image 685
JochenJung Avatar asked Jan 21 '23 19:01

JochenJung


1 Answers

gzcompress implements the ZLIB compressed data format that has the following structure:

     0   1
   +---+---+
   |CMF|FLG|   (more-->)
   +---+---+

(if FLG.FDICT set)

     0   1   2   3
   +---+---+---+---+
   |     DICTID    |   (more-->)
   +---+---+---+---+

   +=====================+---+---+---+---+
   |...compressed data...|    ADLER32    |
   +=====================+---+---+---+---+

Here you see that the last four bytes is a Adler-32 checksum.

In contrast to that, the GZIP file format is a list of of so called members with the following structure:

   +---+---+---+---+---+---+---+---+---+---+
   |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
   +---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

   +---+---+=================================+
   | XLEN  |...XLEN bytes of "extra field"...| (more-->)
   +---+---+=================================+

(if FLG.FNAME set)

   +=========================================+
   |...original file name, zero-terminated...| (more-->)
   +=========================================+

(if FLG.FCOMMENT set)

   +===================================+
   |...file comment, zero-terminated...| (more-->)
   +===================================+

(if FLG.FHCRC set)

   +---+---+
   | CRC16 |
   +---+---+

   +=======================+
   |...compressed blocks...| (more-->)
   +=======================+

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |     CRC32     |     ISIZE     |
   +---+---+---+---+---+---+---+---+

As you can see, GZIP uses a CRC-32 checksum for the integrity check.

So to analyze your code:

  • echo "\x1f\x8b\x08\x00\x00\x00\x00\x00"; – puts out the following header fields:
    • 0x1f 0x8b – ID1 and ID2, identifiers to identify the data format (these are fixed values)
    • 0x08 – CM, compression method that is used; 8 denotes the use of the DEFLATE data compression format (RFC 1951)
    • 0x00 – FLG, flags
    • 0x00000000 – MTIME, modification time
    • the fields XFL (extra flags) and OS (operation system) are set by the DEFLATE data compression format
  • echo $index; – puts out compressed data according to the DEFLATE data compression format
  • echo pack('V', $this->content_crc) . pack('V', $this->content_size); – puts out the CRC-32 checksum and the size of the uncompressed input data in binary
like image 91
Gumbo Avatar answered Jan 30 '23 01:01

Gumbo