Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find gzip start and end?

Tags:

file

gzip

archive

I have some file, there's some random bytes, and multiple gzip files. How can i find start and end of gzip stream inside the some file? there's many random bytes between gzip streams. So, basically i need to find any gzip file and get it from there.

like image 881
Fedcomp Avatar asked Oct 28 '12 20:10

Fedcomp


1 Answers

Reading from the RFC 1952 - GZIP :

Each GZIP file is just a bunch of data chunks (called members), one for each file contained.

Each member starts with the following bytes:

  • 0x1F (ID1)
  • 0x8B (ID2)
  • compression method. 0x08 for a DEFLATEd file. 0-7 are reserved values.
  • flags. The top three bits are reserved and must be zero.
  • (4 bytes) last modified time. May be set to 0.
  • extra flags, defined by the compression method.
  • operating system, actually the file system. 0=FAT, 3=UNIX, 11=NTFS

The end of a member is not delimited. You have to actually walk the entire member. Note that concatenating multiple valid GZIP files creates a valid GZIP file. Also note that overshooting a member may still result in a successful reading of the member (unless the decompressing library is fail-eagerly-and-completely).

like image 110
John Dvorak Avatar answered Sep 28 '22 12:09

John Dvorak