Recently, I was curious how hash algorithms for floating points worked, so I looked at the source code for <code>boost::hash_value</code>. It turns out to be fairly complicated. The actual implementation loops over each digit in the radix and accumulates a hash value. Compared to the integer hash functions, it's much more involved. My question is: why should a floating-point hash algorithm be any more complicated? Why not just hash the binary representation of the floating point value as if it was an integer? Like: <pre class="prettyprint"><code>std::size_t hash_value(float f) { return hash_value(*(reinterpret_cast<int*>(&f))); } </code></pre> I realize that <code>float</code> is not guaranteed to be the same size as <code>int</code> on all systems, but that sort of thing could be handled with a few template meta-programs to deduce an integral type that is the same size as <code>float</code>. So what is the advantage of introducing an entirely different hash function that specifically operates on floating point types?

Take a look at https://svn.boost.org/trac/boost/ticket/4038 In essence it boils down to two things: <ul> <li>Portability: when you take the binary representation of a float, then on some platform it could be possible that a float with a same value has multiple representations in binary. I don't know if there is actually a platform where such an issue exists, but with the complication of denormelized numbers, I'm not sure if this might actually happen.</li> <li>the second issue is what you proposed, it might be that <code>sizeof(float)</code> does not equal <code>sizeof(int)</code>.</li> </ul> I did not find anyone mentioning that the boost hash indeed avoids fewer collisions. Although I assume that separating the mantissa from the exponent might help, but the above link does not suggest that this was the driving design decision.

Hashing floating point values

Tags:

c++

floating-point

hash

Recently, I was curious how hash algorithms for floating points worked, so I looked at the source code for boost::hash_value. It turns out to be fairly complicated. The actual implementation loops over each digit in the radix and accumulates a hash value. Compared to the integer hash functions, it's much more involved.

My question is: why should a floating-point hash algorithm be any more complicated? Why not just hash the binary representation of the floating point value as if it was an integer?

Like:

std::size_t hash_value(float f)
{
  return hash_value(*(reinterpret_cast<int*>(&f)));
}

I realize that float is not guaranteed to be the same size as int on all systems, but that sort of thing could be handled with a few template meta-programs to deduce an integral type that is the same size as float. So what is the advantage of introducing an entirely different hash function that specifically operates on floating point types?

898

asked Sep 13 '11 14:09

Channel72

1 Answers

Take a look at https://svn.boost.org/trac/boost/ticket/4038

In essence it boils down to two things:

Portability: when you take the binary representation of a float, then on some platform it could be possible that a float with a same value has multiple representations in binary. I don't know if there is actually a platform where such an issue exists, but with the complication of denormelized numbers, I'm not sure if this might actually happen.
the second issue is what you proposed, it might be that sizeof(float) does not equal sizeof(int).

I did not find anyone mentioning that the boost hash indeed avoids fewer collisions. Although I assume that separating the mantissa from the exponent might help, but the above link does not suggest that this was the driving design decision.

100

answered Sep 22 '22 12:09

H. Brandsmeier

Related questions
                            
                                On which C standard will C++14 be based?
                            
                                Should the memory vulnerability of the line of code "printf("%s", argv[1]);" be described as a stack overflow?
                            
                                clang interleaved source and assembly
                            
                                Abstract classes and move semantics
                            
                                Constructor and copy-constructor for class containing union with non-trivial members
                            
                                'omp.h' file not found when compiling using clang
                            
                                How to implement an intrusive linked list that avoids undefined behavior?
                            
                                what so special about a use of goto in a chain of if else
                            
                                constexpr static member before/after C++17
                            
                                Random mmaped memory access up to 16% slower than heap data access
                            
                                inline static member variable
                            
                                SDL_PollEvent() stuttering while idle?
                            
                                Why does unsigned char have different default initialization behaviour than other data types?
                            
                                Unit-testing C++ templates
                            
                                How to send a string via PostMessage?
                            
                                how to print std::map value in gdb
                            
                                How is the memory layout of a C/C++ program?
                            
                                Ownership/delete'ing the facet in a locale (std::locale)
                            
                                Is there a valid case for creating a temporary that is immediately destroyed and is not used directly in C++?
                            
                                Anonymous enum classes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With