Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good way to test a file to see if its a zip file?

I am looking as a new file format specification and the specification says the file can be either xml based or a zip file containing an xml file and other files.

The file extension is the same in both cases. What ways could I test the file to decide if it needs decompressing or just reading?

like image 933
Phil Hannent Avatar asked Dec 11 '09 10:12

Phil Hannent


4 Answers

The zip file format is defined by PKWARE. You can find their file specification here.

Near the top you will find the header specification:

A. Local file header:

    local file header signature     4 bytes  (0x04034b50)
    version needed to extract       2 bytes
    general purpose bit flag        2 bytes
    compression method              2 bytes
    last mod file time              2 bytes
    last mod file date              2 bytes
    crc-32                          4 bytes
    compressed size                 4 bytes
    uncompressed size               4 bytes
    file name length                2 bytes
    extra field length              2 bytes

    file name (variable size)
    extra field (variable size)

From this you can see that the first 4 bytes of the header should be the file signature which should be the hex value 0x04034b50. Byte order in the file is the other way round - PKWARE specify that "All values are stored in little-endian byte order unless otherwise specified.", so if you use a hex editor to view the file you will see 50 4b 03 04 as the first 4 bytes.

You can use this to check if your file is a zip file. If you open the file in notepad, you will notice that the first two bytes (50 and 4b) are the ASCII characters PK.

like image 120
Simon P Stevens Avatar answered Nov 12 '22 02:11

Simon P Stevens


You could look at the magic number of the file. The ones for ZIP archives are listed on the ZIP format wikipedia page: PK\003\004 or PK\005\006.

like image 33
Amber Avatar answered Nov 12 '22 02:11

Amber


Check the first few bytes of the file for the magic number. Zip files begin with PK (50 4B). As XML files cannot start with these characters and still be valid, you can be fairly sure as to the file type.

like image 1
Yacoby Avatar answered Nov 12 '22 02:11

Yacoby


You can use file to see if it's a text file(xml) or an executable(zip). Scroll down to see an example.

like image 1
ccheneson Avatar answered Nov 12 '22 01:11

ccheneson