Hash table based containers are very fast associative array (e.g. unordered_map
, unordered_set
).
Their performance is highly dependent on that hash function used to create an index for each entry. As hash tables grow, elements are rehashed again and again.
Pointers are simple type, basically a 4/8 byte value that uniquely identify an object. The problem is that using an address as a result of the hash function is not efficient due to several LSB being zero.
Example:
struct MyVoidPointerHash { size_t operator()(const void* val) const { return (size_t)val; } };
A faster implementation is to lose a few bits:
struct MyVoidPointerHash2 { size_t operator()(const void* val) const { return ((size_t)val) >> 3; // 3 on 64 bit, 1 on 32 bit } };
The latter produced 10-20% performance increase on a large application that uses hash sets and maps with tens of thousands of elements that are frequently built and cleared.
Can someone offer a better scheme for hashing pointers?
The function needs to be:
Update - benchmark results
I ran two sets of tests, one for int*
and for a class pointer that has a size of 4KB. The results are very interesting.
I used std::unordered_set
for all test with data size being 16MB that was allocated in a single new
call. The first algorithm ran twice to make sure sure caches are as hot as possible and the CPU is running at full speed.
Setup: VS2013 (x64), i7-2600, Windows 8.1 x64.
return (size_t)(val);
return '(size_t)(val) >> 3;
uintptr_t ad = (uintptr_t)val; return (size_t)((13 * ad) ^ (ad >> 15));
uintptr_t ad = (uintptr_t)val; return (size_t)(ad ^ (ad >> 16));
Code:
template<typename Tval> struct MyTemplatePointerHash1 { size_t operator()(const Tval* val) const { static const size_t shift = (size_t)log2(1 + sizeof(Tval)); return (size_t)(val) >> shift; } };
Test 1 - int*
:
Test 1 - 4K_class*
:
Update2:
Winner so far is the templated hash (Hash5) function. Best level of performance for speed for various block sizes.
Update 3: Added default hash function for baseline. Turns out it's far from optimal.
The correct answer from a theoretical point of view is: "Use std::hash
which is likely specialized to be as good as it gets, and if that is not applicable, use a good hash function rather than a fast one. The speed of the hash function does not matter so much as its quality".
The practical answer is: "Use std::hash
, which is piss-poor, but nevertheless performs surprisingly well."
TL;DR
After having become intrigued, I ran about 30 hours of benchmarks over the weekend. Among other things, I tried to get an average case vs. worst case and tried to coerce std::unordered_map
into worst-case behavior by giving it deliberately bad hints on bucket counts in respect of the set size inserted.
I compared poor hashes (std::hash<T*>
) to well-known general purpose hashes of overall good quality (djb2, sdbm) as well as variations of these that account for the very short input length, and to hashes which are explicitly thought to be used in hashtables (murmur2 and murmur3), and piss-poor hashes that are actually worse than not hashing at all since they throw away entropy.
Since the lowest 2-3 bits are always zero on pointers due to alignment, I deemed it worthwhile to test a simple right-shift as "hash", so only the non-zero information would be used, in case the hashtable e.g. used only the lowermost N bits. Turned out that for reasonable shifts (I also tried unreasonable shifts!) this actually performs quite well.
Some of my findings were well-known and unsurprising, others are very surprising:
std::unordered_map
into behaving badly. Even when given deliberately bad hints to bucket counts which will force several re-hashes, the overall performance is not much worse. Only the very worst hash functions that throw away most of the entropy in an almost ridiculous way are able to significantly impact performance by more than 10-20% (such as right_shift_12
, which practically results in only 12 distinct hash values for 50,000 inputs! It is not surprising that the hash map runs around 100 times slower in this case -- we're basically doing random access lookup on a linked list.).std::hash<T*>
is outright pathetic for GCC (a simple reinterpret_cast
). Funnily, a functor which does the identical thing consistently performs faster at insertions and slower at random access. The difference is small (a dozen milliseconds on a 8-10 second test run), but it is not noise, it's consistenly present -- probably related to instruction reordering or pipelining. It's stunning how the exact same code (which is also a no-op) consistenly performs differently in two different scenarios.std::hash
. Usually they'll land in the top 3-4.Test were done not just any 4-byte or 8-byte (or whatever) aligned values, but for actual addresses obtained from allocating the complete set of elements on the heap, and storing the addresses as provided by the allocator in a std::vector
(the objects were then deleted, they're not needed).
Addresses were inserted into a newly allocated std::unordered_map
for every test in the order stored in the vector, once in the original order ("sequential") and once after applying a std::random_shuffle
on the vector.
Tests were performed for sets of 50,000 and 1,000,000 objects of size 4, 16, 64, 256, and 1024 (results for 64 omitted here for brevity, they're as you'd expect somewhere in the middle between 16 and 256 -- StackOverflow only allows 30k characters being posted).
Test suite was performed 3 times, results varying by 3 or 4 milliseconds here and there, but overall identical. The results posted here are the last run.
The order of insertions in the "random" test as well as the access pattern (in every test) is pseudorandom, but exactly identical for every hash function in a test run.
The timings under hash benchmarks are for summing up 4,000,000,000 hash values in an integer variable.
The column insert
is the time in milliseconds for 50 iterations of creating a std::unordered_map
, inserting 50,000 and 1,000,000 elements respectively, and destroying the map.
The column access
is the time in milliseconds to do 100,000,000 lookups of a pseudorandom element in the 'vector' followed by looking up that address in the unordered_map
.
This timing includes on the average one cache misse for accessing a random element in the vector
, at least for the large dataset (small dataset fits completely into L2).
All timings on a 2.66GHz Intel Core2, Windows 7, gcc 4.8.1/MinGW-w64_32. Timer granularity @ 1ms.
Source code is available on Ideone, again because of Stackoverflow's 30k character limit.
Note: Running the complete test suite takes well over 2 hours on a desktop PC, so be prepared to take a walk if you want to reproduce the results.
Benchmarking hash funcs... std::hash 2576 reinterpret_cast 2561 djb2 13970 djb2_mod 13969 sdbm 10246 yet_another_lc 13966 murmur2 11373 murmur3 15129 simple_xorshift 7829 double_xorshift 13567 right_shift_2 5806 right_shift_3 5866 right_shift_4 5705 right_shift_5 5691 right_shift_8 5795 right_shift_12 5728 MyTemplatePointerHash1 5652 BasileStarynkevitch 4315 -------------------------------- sizeof(T) = 4 sizeof(T*) = 4 insertion order = sequential dataset size = 50000 elements name insert access std::hash 421 6988 reinterpret_cast 408 7083 djb2 451 8875 djb2_mod 450 8815 sdbm 455 8673 yet_another_lc 443 8292 murmur2 478 9006 murmur3 490 9213 simple_xorshift 460 8591 double_xorshift 477 8839 right_shift_2 416 7144 right_shift_3 422 7145 right_shift_4 414 6811 right_shift_5 425 8006 right_shift_8 540 11787 right_shift_12 1501 49604 MyTemplatePointerHash1 410 7138 BasileStarynkevitch 445 8014 -------------------------------- sizeof(T) = 4 sizeof(T*) = 4 insertion order = random dataset size = 50000 elements name insert access std::hash 443 7570 reinterpret_cast 436 7658 djb2 473 8791 djb2_mod 472 8766 sdbm 472 8817 yet_another_lc 458 8419 murmur2 479 9005 murmur3 491 9205 simple_xorshift 464 8591 double_xorshift 476 8821 right_shift_2 441 7724 right_shift_3 440 7716 right_shift_4 450 8061 right_shift_5 463 8653 right_shift_8 649 16320 right_shift_12 3052 114185 MyTemplatePointerHash1 438 7718 BasileStarynkevitch 453 8140 -------------------------------- sizeof(T) = 4 sizeof(T*) = 4 insertion order = sequential dataset size = 1000000 elements name insert access std::hash 8945 32801 reinterpret_cast 8796 33251 djb2 11139 54855 djb2_mod 11041 54831 sdbm 11459 36849 yet_another_lc 14258 57350 murmur2 16300 39024 murmur3 16572 39221 simple_xorshift 14930 38509 double_xorshift 16192 38762 right_shift_2 8843 33325 right_shift_3 8791 32979 right_shift_4 8818 32510 right_shift_5 8775 30436 right_shift_8 10505 35960 right_shift_12 30481 91350 MyTemplatePointerHash1 8800 33287 BasileStarynkevitch 12885 37829 -------------------------------- sizeof(T) = 4 sizeof(T*) = 4 insertion order = random dataset size = 1000000 elements name insert access std::hash 12183 33424 reinterpret_cast 12125 34000 djb2 22693 51255 djb2_mod 22722 51266 sdbm 15160 37221 yet_another_lc 24125 51850 murmur2 16273 39020 murmur3 16587 39270 simple_xorshift 16031 38628 double_xorshift 16233 38757 right_shift_2 11181 33896 right_shift_3 10785 33660 right_shift_4 10615 33204 right_shift_5 10357 38216 right_shift_8 15445 100348 right_shift_12 73773 1044919 MyTemplatePointerHash1 11091 33883 BasileStarynkevitch 15701 38092 -------------------------------- sizeof(T) = 64 sizeof(T*) = 4 insertion order = sequential dataset size = 50000 elements name insert access std::hash 415 8243 reinterpret_cast 422 8321 djb2 445 8730 djb2_mod 449 8696 sdbm 460 9439 yet_another_lc 455 9003 murmur2 475 9109 murmur3 482 9313 simple_xorshift 463 8694 double_xorshift 465 8900 right_shift_2 416 8402 right_shift_3 418 8405 right_shift_4 423 8366 right_shift_5 421 8347 right_shift_8 453 9195 right_shift_12 666 18008 MyTemplatePointerHash1 433 8191 BasileStarynkevitch 466 8443 -------------------------------- sizeof(T) = 64 sizeof(T*) = 4 insertion order = random dataset size = 50000 elements name insert access std::hash 450 8135 reinterpret_cast 457 8208 djb2 470 8736 djb2_mod 476 8698 sdbm 483 9420 yet_another_lc 476 8953 murmur2 481 9089 murmur3 486 9283 simple_xorshift 466 8663 double_xorshift 468 8865 right_shift_2 456 8301 right_shift_3 456 8302 right_shift_4 453 8337 right_shift_5 457 8340 right_shift_8 505 10379 right_shift_12 1099 34923 MyTemplatePointerHash1 464 8226 BasileStarynkevitch 466 8372 -------------------------------- sizeof(T) = 64 sizeof(T*) = 4 insertion order = sequential dataset size = 1000000 elements name insert access std::hash 9548 35362 reinterpret_cast 9635 35869 djb2 10668 37339 djb2_mod 10763 37311 sdbm 11126 37145 yet_another_lc 11597 39944 murmur2 16296 39029 murmur3 16432 39280 simple_xorshift 16066 38645 double_xorshift 16108 38778 right_shift_2 8966 35953 right_shift_3 8916 35949 right_shift_4 8973 35504 right_shift_5 8941 34997 right_shift_8 9356 31233 right_shift_12 13831 45799 MyTemplatePointerHash1 8839 31798 BasileStarynkevitch 15349 38223 -------------------------------- sizeof(T) = 64 sizeof(T*) = 4 insertion order = random dataset size = 1000000 elements name insert access std::hash 14756 36237 reinterpret_cast 14763 36918 djb2 15406 38771 djb2_mod 15551 38765 sdbm 14886 37078 yet_another_lc 15700 40290 murmur2 16309 39024 murmur3 16432 39381 simple_xorshift 16177 38625 double_xorshift 16073 38750 right_shift_2 14732 36961 right_shift_3 14170 36965 right_shift_4 13687 37295 right_shift_5 11978 35135 right_shift_8 11498 46930 right_shift_12 25845 268052 MyTemplatePointerHash1 10150 32046 BasileStarynkevitch 15981 38143 -------------------------------- sizeof(T) = 256 sizeof(T*) = 4 insertion order = sequential dataset size = 50000 elements name insert access std::hash 432 7957 reinterpret_cast 429 8036 djb2 462 8970 djb2_mod 453 8884 sdbm 460 9110 yet_another_lc 466 9015 murmur2 495 9147 murmur3 494 9300 simple_xorshift 479 8792 double_xorshift 477 8948 right_shift_2 430 8120 right_shift_3 429 8132 right_shift_4 432 8196 right_shift_5 437 8324 right_shift_8 425 8050 right_shift_12 519 11291 MyTemplatePointerHash1 425 8069 BasileStarynkevitch 468 8496 -------------------------------- sizeof(T) = 256 sizeof(T*) = 4 insertion order = random dataset size = 50000 elements name insert access std::hash 462 7956 reinterpret_cast 456 8046 djb2 490 9002 djb2_mod 483 8905 sdbm 482 9116 yet_another_lc 492 8982 murmur2 492 9120 murmur3 492 9276 simple_xorshift 477 8761 double_xorshift 477 8903 right_shift_2 458 8116 right_shift_3 459 8124 right_shift_4 462 8281 right_shift_5 463 8370 right_shift_8 458 8069 right_shift_12 662 16244 MyTemplatePointerHash1 459 8091 BasileStarynkevitch 472 8476 -------------------------------- sizeof(T) = 256 sizeof(T*) = 4 insertion order = sequential dataset size = 1000000 elements name insert access std::hash 9756 34368 reinterpret_cast 9718 34897 djb2 10935 36894 djb2_mod 10820 36788 sdbm 11084 37857 yet_another_lc 11125 37996 murmur2 16522 39078 murmur3 16461 39314 simple_xorshift 15982 38722 double_xorshift 16151 38868 right_shift_2 9611 34997 right_shift_3 9571 35006 right_shift_4 9135 34750 right_shift_5 8978 32878 right_shift_8 8688 30276 right_shift_12 10591 35827 MyTemplatePointerHash1 8721 30265 BasileStarynkevitch 15524 38315 -------------------------------- sizeof(T) = 256 sizeof(T*) = 4 insertion order = random dataset size = 1000000 elements name insert access std::hash 14169 36078 reinterpret_cast 14096 36637 djb2 15373 37492 djb2_mod 15279 37438 sdbm 15531 38247 yet_another_lc 15924 38779 murmur2 16524 39109 murmur3 16422 39280 simple_xorshift 16119 38735 double_xorshift 16136 38875 right_shift_2 14319 36692 right_shift_3 14311 36776 right_shift_4 13932 35682 right_shift_5 12736 34530 right_shift_8 9221 30663 right_shift_12 15506 98465 MyTemplatePointerHash1 9268 30697 BasileStarynkevitch 15952 38349 -------------------------------- sizeof(T) = 1024 sizeof(T*) = 4 insertion order = sequential dataset size = 50000 elements name insert access std::hash 421 7863 reinterpret_cast 419 7953 djb2 457 8983 djb2_mod 455 8927 sdbm 445 8609 yet_another_lc 446 8892 murmur2 492 9090 murmur3 507 9294 simple_xorshift 467 8687 double_xorshift 472 8872 right_shift_2 432 8009 right_shift_3 432 8014 right_shift_4 435 7998 right_shift_5 442 8099 right_shift_8 432 7914 right_shift_12 462 8911 MyTemplatePointerHash1 426 7744 BasileStarynkevitch 467 8417 -------------------------------- sizeof(T) = 1024 sizeof(T*) = 4 insertion order = random dataset size = 50000 elements name insert access std::hash 452 7948 reinterpret_cast 456 8018 djb2 489 9037 djb2_mod 490 8992 sdbm 477 8795 yet_another_lc 491 9179 murmur2 502 9078 murmur3 507 9273 simple_xorshift 473 8671 double_xorshift 480 8873 right_shift_2 470 8105 right_shift_3 470 8100 right_shift_4 476 8333 right_shift_5 468 8065 right_shift_8 469 8094 right_shift_12 524 10216 MyTemplatePointerHash1 451 7826 BasileStarynkevitch 472 8419 -------------------------------- sizeof(T) = 1024 sizeof(T*) = 4 insertion order = sequential dataset size = 1000000 elements name insert access std::hash 10910 38432 reinterpret_cast 10892 38994 djb2 10499 38985 djb2_mod 10507 38983 sdbm 11318 37450 yet_another_lc 11740 38260 murmur2 16960 39544 murmur3 16816 39647 simple_xorshift 16096 39021 double_xorshift 16419 39183 right_shift_2 10219 38909 right_shift_3 10012 39036 right_shift_4 10642 40284 right_shift_5 10116 38678 right_shift_8 9083 32914 right_shift_12 9376 31586 MyTemplatePointerHash1 8777 30129 BasileStarynkevitch 16036 38913 -------------------------------- sizeof(T) = 1024 sizeof(T*) = 4 insertion order = random dataset size = 1000000 elements name insert access std::hash 16304 38695 reinterpret_cast 16241 39641 djb2 16377 39533 djb2_mod 16428 39526 sdbm 15402 37811 yet_another_lc 16180 38730 murmur2 16679 39355 murmur3 16792 39586 simple_xorshift 16196 38965 double_xorshift 16371 39127 right_shift_2 16445 39263 right_shift_3 16598 39421 right_shift_4 16378 39839 right_shift_5 15517 38442 right_shift_8 11589 33547 right_shift_12 11992 49041 MyTemplatePointerHash1 9052 30222 BasileStarynkevitch 16163 38829
After letting this question lay for a while, I'll post my best hash function for pointers so far:
template<typename Tval> struct MyTemplatePointerHash1 { size_t operator()(const Tval* val) const { static const size_t shift = (size_t)log2(1 + sizeof(Tval)); return (size_t)(val) >> shift; } };
It's high performing for various block sizes.
If someone has a better function, I'll change the accepted answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With