Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order independent hashing

Tags:

c++

hash

crypto++

I'm creating keys for key-value data by taking a few (<10) pieces of information that identify the data and producing one hash from them combined. For this, I've been using CryptoPP's SHA256::Update function, which lets you add pieces at a time:

#include "sha.h"
...
byte outputBuf[CryptoPP::SHA256::DIGESTSIZE];
CryptoPP::SHA256 hash;
hash.Update(pData1, lenData1); // pData* can point to int, double or std::string
hash.Update(pData2, lenData2);
...
hash.Final(outputBuf);

I've noticed that the order of the calls to Update matters (i.e. if you change the order of the two Update statements, you'll get a different hash). I would like this to be order independent instead. So:

  • Does CryptoPP offer a way to do this?
  • If not, can you suggest an alternative approach? So far I think using xor to combine the parameters would work. One problem is that if two pieces of data are the same, they'll cancel out. Can you foresee problems with this?
like image 213
Ari Avatar asked Sep 13 '25 04:09

Ari


1 Answers

The comment saying that xor will increase the number of collisions is only true, if you consider {1, 2} and {2, 1} to be different inputs. I guess, you don't, as otherwise you wouldn't want an order-independent hash. So h({1, 2}) = h({2, 1}) is no collision as you're providing the same input.

The simplest solution is sorting and than using your favorite hash function. It is as secure as your hash function (confirm on crypto.stackexchange.com if you care).

Xoring hashes is definitely a bad idea as two equal elements cancel out. Adding them is much better, but with two equal elements, the least significant bit will be zero (with four such elements, two bits will be zero, etc.). This may be acceptable.

Note that any such method is pretty insecure as it allows finding collisions much faster (proof upon request). You may or may not need security, but don't try to invent a secure method, as it's practically impossible (every well-known hash function has many man-months of analysis behind it).

like image 90
maaartinus Avatar answered Sep 15 '25 17:09

maaartinus