Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Zipping the same content twice gives two files with different SHA1?

Tags:

I have run into a strange problem with git and zip files. My build script takes a bunch of documentation html pages and zips them into a docs.zip I then check this file into git.

The problem I am having is that every time I re-run the build script and get a new zip file the new zip file has a different SHA1 than the previous run. My build script is calling the ant zip task. However manualy calling the macOSX zip from the Mac OS X shell gives me a different sha1 if I zip up the same directory twice.

Run 1:

zip foo.zip * openssl sha1 foo.zip  rm foo.zip  

Run 2:

zip foo.zip * openssl sha1 foo.zip 

Run 1 and run2 give different SHA1 even though the content did not change between runs. In both cases zip prints out exactly the same files being zipped it does not indicate that any OS specific files like .DS_Store are being included in the zip file.

Is the zip algoritm deterministic? If run on the same content will it produce exactly the same bits? if not why not?

What are my choices for zipping the files in a deterministic way? There are thousands of them in the zipped up file, I don't expect those files to change much. I know that git will zip up any files you checkin but the motivation to zip them is to just keep the mass of them out of the way.

like image 349
ams Avatar asked Mar 15 '12 04:03

ams


People also ask

Does zipping a file change the hash?

Because the ZIP file contains the file name. So if you change the name of the zipped files this changes the data in the ZIP, and therefore the hash of the ZIP file.

What happens if you zip a file twice?

Why? Because zipping reduces redundancy and unneeded space to a minimum, and as you try to zip again, you'll find nothing to reduce, but still have to add a fresh zip header for each new zip. But, nothing prevents you from doing it.

Do zip files have hashes?

The zipped file is compressed and then encrypted. This does not require storing a hash in the file because it's not authenticating, it's decrypting. The only thing that may be stored in the file is a salt, depending on the encryption used.

Why would you zip compress multiple files What is the benefit?

First, zipped files save storage space and increase the efficiency of your computer. It's also an effective way to improve file transfers with email. You're able to send emails faster with smaller files. Furthermore, the ZIP file format will encrypt your data.


1 Answers

According to Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) seems that zip files have headers for File last modification time and File last modification date so any zip file checked into git will appear to git to have changed if the zip is rebuilt from the same content since. And it seems that there is no flag to tell it to not set those headers.

I am resorting to just using tar, it seems to produce the same bytes for the same input if run multiple times.

like image 89
ams Avatar answered Sep 22 '22 16:09

ams