Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashing of pointer values

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.

So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.

like image 777
zwol Avatar asked Aug 09 '10 17:08

zwol


People also ask

What are hashed values?

Hash values can be thought of as fingerprints for files. The contents of a file are processed through a cryptographic algorithm, and a unique numerical value – the hash value - is produced that identifies the contents of the file.

What is hash value in hashing?

Hashing is the process of generating a value from a text or a list of numbers using a mathematical function known as a hash function. A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table.

What is the formula for hashing?

With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.


2 Answers

This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.

Here are some more algorithms, including performance tests.

It seems that the magic words are "integer hashing".

like image 154
Thomas Avatar answered Sep 21 '22 14:09

Thomas


They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.

like image 34
Nick Johnson Avatar answered Sep 18 '22 14:09

Nick Johnson