Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do the md5 hashes of two tarballs of the same file differ?

I can run:

echo "asdf" > testfile
tar czf a.tar.gz testfile
tar czf b.tar.gz testfile
md5sum *.tar.gz

and it turns out that a.tar.gz and b.tar.gz have different md5 hashes. It's true that they're different, which diff -u a.tar.gz b.tar.gz confirms.

What additional flags do I need to pass in to tar so that its output is consistent over time with the same input?

like image 778
Marcus Emilsson Avatar asked Apr 06 '16 23:04

Marcus Emilsson


People also ask

Can two files have the same MD5 hash?

As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash. However, it is very unlikely that any two non-identical files in the real world will have the same MD5 hash, unless they have been specifically created to have the same hash.

What does it mean if two files have the same MD5 hash?

A: An MD5 hash value is a 32-character string that identifies the contents of a file. If two files have the same contents then it's probable they will have the same MD5 hash value. However, please note that it is possible to create two completely different files that have the same MD5 hash value.

What causes MD5 to change?

MD5 Checksum is used to verify the integrity of files, as virtually any change to a file will cause its MD5 hash to change. Most commonly, md5sum is used to verify that a file has not changed as a result of a faulty file transfer, a disk error or non-malicious modification.

Are all MD5 hashes the same?

Yes, MD5 checksums are platform agnostic and will produce the same value every time on the same file/string/whatever.


1 Answers

tar czf outfile infiles is equivalent to

tar cf - infiles | gzip > outfile

The reason the files are different is because gzip puts its input filename and modification time into the compressed file. When the input is a pipe, it uses an empty string as the filename and the current time as the modification time.

But it also has a --no-name option, which tells it not to put the name and timestamp into the file. So if you write the expanded command explicitly, instead of using the -z option to tar, you can make use of this option.

tar cf - testfile | gzip --no-name > a.tar.gz
tar cf - testfile | gzip --no-name > b.tar.gz

I tested this on OS X 10.6.8 and it works.

like image 149
Barmar Avatar answered Sep 18 '22 07:09

Barmar