Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What hashing method is implemented in standard unordered containers? [closed]

Since language standards rarely mandate implementation methods, I'd like to know what is the real world hashing method used by C++ standard library implementations (libc++, libstdc++ and dinkumware).

In case it's not clear, I expect the answer to be a method like these :

  • Hashing with chaining
  • Hashing by Division / Multiplication
  • Universal hashing
  • Perfect hashing (static, dynamic)
  • Hashing with open addressing (linear/quadratic probing or double hashing)
  • Robin-Hood hashing
  • Bloom Filters
  • Cuckoo hashing

Knowing why a particular method was chosen over the others would be a good thing as well.

like image 333
Nikos Athanasiou Avatar asked Jul 07 '15 11:07

Nikos Athanasiou


1 Answers

  • libstdc++: Chaining, only power-of-two table size, default (if it is even configurable) load threshold for rehashing is 1.0, buckets are all separate allocations. Outdated. I don't know current state of things.
  • Rust: Robin Hood, default load threshold for rehashing is 0.9 (too much for open addressing, BTW)
  • Go: table slots point to "bins" of 5(7?) slots, not sure what happens if bin is full, AFAIR it is growing in a vector/ArrayList manner
  • Java: chaining, only power-of-two table size, default load threshold is 0.75 (configurable), buckets (called entries) are all separate allocations. In recent versions of Java, above a certain threshold, chains are changed to binary search trees.
  • C#: chaining, buckets are allocated from a flat array of bucket structures. If this array is full, it is rehashed (with the table, I suppose) in a vector/ArrayList manner.
  • Python: open addressing, with own unique collision-resolution scheme (not very fortunate, IMHO), only power-of-two table sizes, load threshold for rehashing is 0.666.. (good). However, slot data in a separate array of structures (like in C#), i. e. hash table operations touch at least two different random memory locations (in the table and in the array of slot data)

If some points missed in descriptions, it doesn't mean they are absent, it means I don't know/remember details.

like image 124
leventov Avatar answered Nov 14 '22 23:11

leventov