Using zlib in a program and noticed a one bit difference in how "foo"
is compressed on Windows 1F8B080000000000000A4BCBCF07002165738C03000000
and Linux 1F8B08000000000000034BCBCF07002165738C03000000
. Both decompress back to "foo"
I decided to check outside our code to see if the implementation was correct and used the test programs in the zlib repository to double check. I got the same results:
Linux: echo -n foo| ./minigzip64 > text.txt'
Windows: echo|set /p="foo" | minigzip > text.txt
What would account for this difference? Is it a problem?
1F8B 0800 0000 0000 000 *3/A* 4BCB CF07 0021 6573 8C03 0000 00
First off, if it decompresses to what was compressed, then it's not a problem. Different compressors, or the same compressor at different settings, or even the same compressor with the same settings, but different versions, can produce different compressed output from the same input.
Second, the compressed data in this case is identical. Only the last byte of the gzip header that precedes the compressed data is different. That byte identifies the originating operating system. Hence it rightly varies between Linux and Windows.
Even on the same operating system, the header can vary since it carries a modification date and time. However in both your cases the modification date and time was left out (set to zeros).
Just to add to the accepted answer here. I got curious and tried out for myself, saving the raw data and opening with 7zip:
You can immediately notice that the only field that's different is the Host OS.
Header Data Footer
1F8B080000000000000A | 4BCBCF0700 | 2165738C03000000
Let's break that down.
First, from this answer I realize it's actually a gzip instead of a zlib header:
Level ZLIB GZIP
1 | 78 01 | 1F 8B
9 | 78 DA | 1F 8B
Further searching led me to an article about Gzip on forensics wiki. The values in this case are:
Offset Size Value Description
0 | 2 | 1f8b | Signature (or identification byte 1 and 2)
2 | 1 | 08 | Compression Method (deflate)
3 | 1 | | Flags
4 | 4 | | Last modification time
8 | 1 | | Compression flags (or extra flags)
9 | 1 | 0A | Operating system (TOPS-20)
Offset Size Value Description
0 | 4 | 2165738C | Checksum (CRC-32) (Little endian)
4 | 4 | 03 | Uncompressed data size Value in bytes.
Interesting thing to note here is that even if the Last modification time and Operating system in header is different, it will compress to the same data with the same checksum in the footer.
The IETF RFC has a more detailed summary of the format
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With