Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect type of compression used on the file? (if no file extension is specified)

How can one detect the type of compression used on the file? (assuming that .zip, .gz, .xz or any other extension is not specified).

Is this information stored somewhere in the header of that file?

like image 222
22332112 Avatar asked Oct 01 '13 15:10

22332112


People also ask

How do you tell if a file has been compressed?

You can check the extension. If you don't trust the extension, then you have to look into the file and check for signatures. You can find some of them here. The call to stat will not tell you about individual files being compressed, as this flag means that the file system is compressed.

Which file extension is not a type of compressed file?

tar file extension is not a type of compressed file, it is actually a single file that contains a group of files that is uncompressed.

What is a typical extension used to identify a compressed file?

Common compressed file extensions are . ZIP, . RAR, . ARJ, .


2 Answers

You can determine that it is likely to be one of those formats by looking at the first few bytes. You should then test to see if it really is one of those, using an integrity check from the associated utility for that format, or by actually proceeding to decompress.

You can find the header formats in the descriptions:

  • Zip (.zip) format description, starts with 0x50, 0x4b, 0x03, 0x04 (unless empty — then the last two are 0x05, 0x06 or 0x06, 0x06)
  • Gzip (.gz) format description, starts with 0x1f, 0x8b, 0x08
  • xz (.xz) format description, starts with 0xfd, 0x37, 0x7a, 0x58, 0x5a, 0x00

Others:

  • zlib (.zz) format description, starts with two bytes (in bits) 0aaa1000 bbbccccc, where ccccc is chosen so that the first byte viewed as a int16 times 256 plus the second byte viewed as a int16 is a multiple of 31. e.g: 01111000(bits) = 120(int16), 10011100(bits) = 156(int16), 120 * 256 + 156 = 30876 which is a multiple of 31
  • compress (.Z) starts with 0x1f, 0x9d
  • bzip2 (.bz2) starts with 0x42, 0x5a, 0x68
  • Zstandard (.zstd) format description, frame starts with a 4 byte magic number using little-endian format 0xFD2FB528, a skipable frame starts with 0x184D2A5? (question mark is any value from 0 to F), and dictionary starts with 0xEC30A437.
  • A few more formats in the magic database from the file command
like image 181
Mark Adler Avatar answered Sep 19 '22 05:09

Mark Adler


If you're on a Linux box just use the 'file' command.

http://en.wikipedia.org/wiki/File_(command)

$ mv foo.zip dink $ file dink dink: gzip compressed data, from Unix, last modified: Sat Aug  6 08:08:57 2011, max compression $ 
like image 29
ct_ Avatar answered Sep 20 '22 05:09

ct_