Since language standards rarely mandate implementation methods, I'd like to know what is the real world hashing method used by C++ standard library implementations (libc++, libstdc++ and dinkumware).
In case it's not clear, I expect the answer to be a method like these :
- Hashing with chaining
- Hashing by Division / Multiplication
- Universal hashing
- Perfect hashing (static, dynamic)
- Hashing with open addressing (linear/quadratic probing or double hashing)
- Robin-Hood hashing
- Bloom Filters
- Cuckoo hashing
Knowing why a particular method was chosen over the others would be a good thing as well.
-
libstdc++: Chaining, only power-of-two table size, default (if it is even configurable) load threshold for rehashing is 1.0, buckets are all separate allocations. Outdated. I don't know current state of things.
- Rust: Robin Hood, default load threshold for rehashing is 0.9 (too much for open addressing, BTW)
- Go: table slots point to "bins" of 5(7?) slots, not sure what happens if bin is full, AFAIR it is growing in a
vector
/ArrayList
manner
- Java: chaining, only power-of-two table size, default load threshold is 0.75 (configurable), buckets (called entries) are all separate allocations. In recent versions of Java, above a certain threshold, chains are changed to binary search trees.
- C#: chaining, buckets are allocated from a flat array of bucket structures. If this array is full, it is rehashed (with the table, I suppose) in a
vector
/ArrayList
manner.
- Python: open addressing, with own unique collision-resolution scheme (not very fortunate, IMHO), only power-of-two table sizes, load threshold for rehashing is 0.666.. (good). However, slot data in a separate array of structures (like in C#), i. e. hash table operations touch at least two different random memory locations (in the table and in the array of slot data)
If some points missed in descriptions, it doesn't mean they are absent, it means I don't know/remember details.