Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extendible hashing: why does anyone use most significant bits?

When coding extendible hashing, one has the choice of using the most significant bits or the least significant bits of the hash value in order to determine which bucket to hash to. Using least significant bits has a number of advantages:

  • When you double the directory, you can just copy all the pointers, instead of having to create a new directory that interleaves them.
  • You can simplify discussion of the algorithm by not even talking about bits at all, and just using modular arithmetic as you would with hashing in general. Using the 3 least significant bits to choose a bucket is the same as h(x) = x mod 2^3.
  • You don't need to specify in advance a width of the binary numbers; if you're using most significant bits, you need to have a specific bit length in mind.

What I can't wrap my head around is why reference after reference after reference shows extendible hashing done with most significant bits. As far as I can tell, the only advantage most significant bits yields is a diagram on paper (or on screen) that doesn't have crossing lines. Is there any good reason why so many sources so most significant bits instead of least?

like image 853
Vultan Avatar asked Nov 10 '14 23:11

Vultan


1 Answers

I finally went back to the original source paper by Fagin, et. al. They address this:

"We note that if we had used suffixes of pseudokeys instead of prefixes, then the algorithm for doubling the directory would be especially easy: it would essentially consist of making a second copy of the nonheader portion of the directory, immediately after the first copy. However, we chose to use prefixes for the sake of intuitive simplicity (thus, by using prefixes the keys can easily be accessed in pseudokey order, rather than in inverted pseudokey order). "

I don't understand why they saw this approach as more intuitive, as you could dispense with the whole bit idea and go with modular arithmetic instead, but it appears that this was at least their rationale.

like image 126
Vultan Avatar answered Oct 03 '22 13:10

Vultan