According to the specifiction of gz the filesize is saved in the last 4bytes of a .gz file.
I have created 2 files with
dd if=/dev/urandom of=500M bs=1024 count=500000
dd if=/dev/urandom of=5G bs=1024 count=5000000
I gziped them
gzip 500M 5G
I checked the last 4 bytes doing
tail -c4 500M|od -I (returns 512000000 as expected)
tail -c4 5G|od -I (returns 825032704 as not expected)
It seems that hitting the invisible 32bit barrier, makes the value written into the ISIZE completely nonsense. Which is more annoying, than if they had used some error bit instead.
Does anyone know of a way to get the uncompressed .gz filesize from the .gz without extracting it?
thanks
specification: http://www.gzip.org/zlib/rfc-gzip.html
edit: if anyone to try it out, you could use /dev/zero instead of /dev/urandom
There isn't one.
The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).
Its worth noting that ISIZE is defined as
ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input
data modulo 2^32.
in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With