Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is std::hash not guaranteed to be deterministic?

Hereafter, we use N4140 (C++14 Standard).


According to § 17.6.3.4 Hash requirements,

The value returned shall depend only on the argument k for the duration of the program.

[ Note: Thus all evaluations of the expression h(k) with the same value for k yield the same result for a given execution of the program. — end note ]

and § 20.9.12 Class template hash says

...

the instantiation hash<Key> shall:

(1.1) — satisfy the Hash requirements (17.6.3.4) ...

(1.2) — ...


This means a hash value of value (i.e. hash<decltype(value)>(value)) may take a different value if you restart the program.

But why? This limitation was not in the Standard of C++11, but in the Standard of C++14, C++17 and C++20. As a user (not a STL developer), it would be quite useful if std::hash were deterministic. Are there any mathematical difficulties in implementing a deterministic hash function? But hash functions we daily use (e.g. deprecated md5sum or safer sha256) are all deterministic. Is there a problem of efficiency?

like image 579
ynn Avatar asked Mar 06 '20 16:03

ynn


People also ask

Should a hash function be deterministic?

A hash procedure must be deterministic—meaning that for a given input value it must always generate the same hash value. In other words, it must be a function of the data to be hashed, in the mathematical sense of the term.

Are hash functions non deterministic?

A hashing function is a one-way function that takes some input and returns a deterministic output. The output is often referred to as a digest, a hash code, or simply a hash.

Are hashes deterministic?

Hash functions are deterministic. Hence, when we pass the same input to the hash function, it always generates the same output hash code, e.g. SHA-1 hashes are 160 bits long. Uniformity – hashes should be distributed evenly across possible values. The specific hash function always produces fixed-size output.

Is hash table deterministic?

But hash functions we daily use (e.g. deprecated md5sum or safer sha256 ) are all deterministic.


2 Answers

There is no need for the hash function to be deterministic between runs, but you can still provide your own hash, e.g. for unordered containers if it's a behavior you rely on.

As for why, cppreference says:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks.

If the Hash requirements tells it to be deterministic, then you wouldn't be able to provide a salted hash without breaking the requirement.

Here is the actual explanation why

like image 181
Geoffroy Avatar answered Sep 20 '22 22:09

Geoffroy


This answer (and links in it) suggested by @NathanOliver is ultimately helpful. Let me cite important parts.

For a non-cryptographic hash function, it's possible to pre-calculate massive inputs with the same hashed value to algorithmically slow down the unordered containers, and results in a denial-of-service attack.

(from Issue 2291. std::hash is vulnerable to collision DoS attack)

For this reason, language designers are migrating to random hashing. In random hashing, the hash value of the string “a” can change every time you run your program. Random hashing is now the default in Python (as of version 3.3), Ruby (as of version 1.9) and Perl (as of version 5.18).

(from Do you realize that you are using random hashing?)

Move to Ready, rather than Immediate, as even the permission has been contentious in reflector discussion

(from Issue 2291. std::hash is vulnerable to collision DoS attack)

In practice, as far as I understand, no implementation of std::hash implements random hashing but you can write your own my::secure_hash.

(from this answer)


P.S.

I just googled "hash table dos" and found an informative page: The moment when you realize every server in the world is vulnerable.

like image 20
ynn Avatar answered Sep 20 '22 22:09

ynn