Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest hash algorithm to check if two files are equal?

Tags:

file

hash

crc

What is the fastest way to create a hash function which will be used to check if two files are equal?

Security is not very important.

Edit: I am sending a file over a network connection, and will be sure that the file on both sides are equal

like image 316
eflles Avatar asked Nov 19 '09 07:11

eflles


People also ask

Which hash algorithm is fastest?

SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings.

Can 2 files have the same checksum?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value.

What is MD5 and SHA-256?

The MD5 algorithm produces a 128-bit output, which is expressed as a 32 characters hexadecimal. The SHA-256 algorithm is twice longer, with 64 hexadecimal characters for 256-bits.

Which hash algorithm is the strongest?

The current strongest encryption algorithms are SHA-512, RIPEMD-320, and Whirlpool. Any one of these algorithms are worthy of protecting top secret level information for your business. Cracked?


2 Answers

Unless you're using a really complicated and/or slow hash, loading the data from the disk is going to take much longer than computing the hash (unless you use RAM disks or top-end SSDs).

So to compare two files, use this algorithm:

  • Compare sizes
  • Compare dates (be careful here: this can give you the wrong answer; you must test whether this is the case for you or not)
  • Compare the hashes

This allows for a fast fail (if the sizes are different, you know that the files are different).

To make things even faster, you can compute the hash once and save it along with the file. Also save the file date and size into this extra file, so you know quickly when you have to recompute the hash or delete the hash file when the main file changes.

like image 108
Aaron Digulla Avatar answered Oct 15 '22 03:10

Aaron Digulla


One approach might be to use a simple CRC-32 algorithm, and only if the CRC values compare equal, rerun the hash with a SHA1 or something more robust. A fast CRC-32 will outperform a cryptographically secure hash any day.

like image 30
Greg Hewgill Avatar answered Oct 15 '22 03:10

Greg Hewgill