Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing a hash function for best performance

I need to compare many files (some could be large, some are small) over the network. So I am planning to hash every file on every client and send only the hash value over the network.
The main goal here is performance. This implies minimal network traffic. Security is not the issue.
There should also be "zero" collisions since I don't want ever to mistakenly consider two different files as identical. Saying that, I know that theoretically there are always collisions, I just want the chance to ever practically meet them be absolutely negligible.

So my question is: Which .net hash function is best for this task?

I was thinking to use a buffered reader and MD5CryptoServiceProvider (since CNG may not be available on all clients).

Is there a way to get better performance than that? (perhaps using some external library?)

like image 360
Amir Gonnen Avatar asked Apr 09 '12 07:04

Amir Gonnen


People also ask

Which hash function is fastest?

SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings.

How can you improve the performance of a hash table?

The simplest and most obvious improvement would be to increase the number of buckets in the hash table to something like 1.2 million -- at least assuming your hash function can generate numbers in that range (which it typically will).

What function can serve as a good hash function?

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc. Also see tpop pp.


1 Answers

I was in a similar situation where I needed a .NET hashing algorithm. I needed it for server response caching where speed was more important than security. When I got to this thread, I noticed speculation about performance differences in choice of algorithm and in 32-bit versus 64-bit execution. To bring some science into this debate, I have created some code to actually test some of the available algorithms. I decided to test the built-in MD5, SHA1, SHA256, and SHA512 algorithms. I also included a CRC32 implementation from force-net and a CRC64 implementation from DamienGKit. My results with a ~115MB file are as follows:

Running in 32-bit mode.

Warm-up phase:

CRC32: 296 MiB/s [9C54580A] in 390ms.

CRC64: 95 MiB/s [636BCF1455BC885A] in 1212ms.

MD5: 191 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 604ms.

SHA1: 165 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 699ms.

SHA256: 93 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1240ms.

SHA512: 47 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 2464ms.

Final run:

CRC32: 279 MiB/s [9C54580A] in 414ms.

CRC64: 96 MiB/s [636BCF1455BC885A] in 1203ms.

MD5: 197 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 588ms.

SHA1: 164 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 707ms.

SHA256: 96 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1200ms.

SHA512: 47 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 2441ms.


Running in 64-bit mode.

Warm-up phase:

CRC32: 310 MiB/s [9C54580A] in 373ms.

CRC64: 117 MiB/s [636BCF1455BC885A] in 986ms.

MD5: 198 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 584ms.

SHA1: 184 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 627ms.

SHA256: 104 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1112ms.

SHA512: 149 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 778ms.

Final run:

CRC32: 292 MiB/s [9C54580A] in 396ms.

CRC64: 119 MiB/s [636BCF1455BC885A] in 975ms.

MD5: 199 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 582ms.

SHA1: 192 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 601ms.

SHA256: 106 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1091ms.

SHA512: 157 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 738ms.

These results were obtained from a compiled Release-build ASP.NET project running .NET v4.5.2. Both the 32-bit and 64-bit results are from the same machine. In Visual Studio, I changed the mode via Tools > Options > Projects and Solutions > Web Projects > Use the 64 bit version of IIS Express, along with changing the Platform target of the project.

We can see that although the results fluctuate a bit run-to-run, CRC32 (by force-net) is the fastest, followed by Microsoft's MD5 and SHA1. Curiously, there is no performance benefit in choosing DamienGKit's CRC64 over the built-in MD5 or SHA1. 64-bit execution seems to help a lot with SHA512 but only modestly with the others.

To answer the OP's question, it would seem that the built-in MD5 or SHA1 may provide the best balance of collision-avoidance and performance.

My code is as follows:

Stopwatch timer = new Stopwatch();
Force.Crc32.Crc32Algorithm hasherCRC32 = new Force.Crc32.Crc32Algorithm();
System.Security.Cryptography.MD5Cng hasherMD5 = new System.Security.Cryptography.MD5Cng();
System.Security.Cryptography.SHA1Cng hasherSHA1 = new System.Security.Cryptography.SHA1Cng();
System.Security.Cryptography.SHA256Cng hasherSHA256 = new System.Security.Cryptography.SHA256Cng();
System.Security.Cryptography.SHA512Cng hasherSHA512 = new System.Security.Cryptography.SHA512Cng();
String result = "";
String rate = "";

Status.Text += "Running in " + ((IntPtr.Size == 8) ? "64" : "32") + "-bit mode.<br /><br />";

Status.Text += "Warm-up phase:<br />";

timer.Restart();
result = BitConverter.ToUInt32(hasherCRC32.ComputeHash(ImageUploader.FileBytes), 0).ToString("X8");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC32: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = DamienG.Security.Cryptography.Crc64Iso.Compute(ImageUploader.FileBytes).ToString("X16");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC64: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherMD5.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "MD5: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA1.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA1: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA256.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA256: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA512.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA512: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

Status.Text += "<br />Final run:<br />";

timer.Restart();
result = BitConverter.ToUInt32(hasherCRC32.ComputeHash(ImageUploader.FileBytes), 0).ToString("X8");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC32: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = DamienG.Security.Cryptography.Crc64Iso.Compute(ImageUploader.FileBytes).ToString("X16");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC64: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherMD5.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "MD5: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA1.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA1: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA256.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA256: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA512.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA512: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";
like image 87
Michael Avatar answered Sep 18 '22 09:09

Michael