Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent results in compressing with zlib between Win32 and Linux-64 bit

Using zlib in a program and noticed a one bit difference in how "foo" is compressed on Windows 1F8B080000000000000A4BCBCF07002165738C03000000 and Linux 1F8B08000000000000034BCBCF07002165738C03000000. Both decompress back to "foo"

I decided to check outside our code to see if the implementation was correct and used the test programs in the zlib repository to double check. I got the same results:

Linux: echo -n foo| ./minigzip64 > text.txt'

Windows: echo|set /p="foo" | minigzip > text.txt

What would account for this difference? Is it a problem?

1F8B 0800 0000 0000 000 *3/A* 4BCB CF07 0021 6573 8C03 0000 00

like image 633
kealist Avatar asked Jul 10 '17 03:07

kealist


2 Answers

First off, if it decompresses to what was compressed, then it's not a problem. Different compressors, or the same compressor at different settings, or even the same compressor with the same settings, but different versions, can produce different compressed output from the same input.

Second, the compressed data in this case is identical. Only the last byte of the gzip header that precedes the compressed data is different. That byte identifies the originating operating system. Hence it rightly varies between Linux and Windows.

Even on the same operating system, the header can vary since it carries a modification date and time. However in both your cases the modification date and time was left out (set to zeros).

like image 152
Mark Adler Avatar answered Oct 03 '22 22:10

Mark Adler


Just to add to the accepted answer here. I got curious and tried out for myself, saving the raw data and opening with 7zip:

Windows:

gzip-win

Linux:

gzip-linux

You can immediately notice that the only field that's different is the Host OS.

What the data means

Header                 Data         Footer
1F8B080000000000000A | 4BCBCF0700 | 2165738C03000000

Let's break that down.

Header

First, from this answer I realize it's actually a gzip instead of a zlib header:

Level   ZLIB    GZIP 
  1   | 78 01 | 1F 8B 
  9   | 78 DA | 1F 8B 

Further searching led me to an article about Gzip on forensics wiki. The values in this case are:

Offset   Size   Value   Description
0      | 2    | 1f8b | Signature (or identification byte 1 and 2)
2      | 1    | 08   | Compression Method (deflate)
3      | 1    |      | Flags
4      | 4    |      | Last modification time
8      | 1    |      | Compression flags (or extra flags)
9      | 1    | 0A   | Operating system (TOPS-20)

Footer

Offset   Size   Value    Description
0      | 4    | 2165738C | Checksum (CRC-32) (Little endian)
4      | 4    | 03       | Uncompressed data size Value in bytes.

Interesting thing to note here is that even if the Last modification time and Operating system in header is different, it will compress to the same data with the same checksum in the footer.

The IETF RFC has a more detailed summary of the format

like image 35
Geeky I Avatar answered Oct 03 '22 23:10

Geeky I