Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash a Set of Integers from a Domain into a Set of Buckets

Tags:

algorithm

hash

Say I have a set of integers ranging between 1-100. I will only have 5 of these integers drawn out of a hat. I want to then take those 5 integers and place them into 5 buckets guaranteed unique (without having to deduplicate or anything using something like quadratic probing). Wondering how to do that.

For example, say I have these numbers (random from 1-100):

1 5 20 50 100

I then want to take those numbers and place them into these 5 buckets:

a b c d e

Using some hash function to accomplish it. For example, perhaps like this:

hash(1)   -> b
hash(5)   -> a
hash(20)  -> e
hash(50)  -> d
hash(100) -> c

Wondering how to write the hash function so that it takes a number x from a domain of numbers D and a set of numbers D(X) from that domain, and outputs 1 bucket b from the set of buckets B.

H : D(X) -> B

Next time around I might have 6 numbers between 1 and 1,000, going into 6 buckets. So then I would need a new hash function that works using those constraints (6 numbers, 6 buckets, range 1-1,000).

The goal is as few steps as possible.

Note: The hash function for this example won't take integers in a domain larger than 10,000 lets say, as well as the size of the set of integers limited to some small number too like 1,000.


Update

Basically I am trying to get this to happen:

// var domain = [1, 2, ..., 100]
// var set = [1, 5, 20, 50, 100]
// var buckets = [1, 2, 3, 4, 5]

hash(1) // 2
hash(5) // 1
hash(20) // 5
hash(50) // 4
hash(100) // 3

function hash(integer) {
  if (integer == 1) return 2
  if (integer == 5) return 1
  if (integer == 20) return 5
  if (integer == 50) return 4
  if (integer == 100) return 3
}

But I don't know how to construct that hash function dynamically.

One solution (in JavaScript) would be to just create a map like this:

var map = {
  1: 2,
  5: 1,
  20: 5,
  50: 4,
  100: 3
}

But that's sort of cheating because the object in JavaScript is implemented as a hashtable underneath (or something like that). So I am looking for how to do this at a low level, just using basically what assembly gives you.

Pretty much, I want to do this:

           1                     
    5      |                     
    |      |                    20
    |      |             50     |
    |      |      100    |      |
[ slot1, slot2, slot3, slot4, slot5 ]

Where 1 is somehow "hashed" to go into that slot2 in an array of size 5 (that slot is arbitrary for this example), etc.

like image 346
Lance Avatar asked Jul 06 '18 10:07

Lance


People also ask

What is a bucket hashing?

Hash buckets are used to apportion data items for sorting or lookup purposes. The aim of this work is to weaken the linked lists so that searching for a specific item can be accessed within a shorter timeframe. A hash table that uses buckets is actually a combination of an array and a linked list.

How many buckets are in a hash?

The number of buckets in a hash structure will almost always be on the order of the number of items in the hash structure. The phrase "on the order of" is intentionally imprecise. That means you could have twice as many buckets as items. Or two times as many items as buckets.

What is a good hash function for integers?

A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.

What is a bucket array in the context of a hash code?

A bucket is simply a fast-access location (like an array index) that is the the result of the hash function. The idea with hashing is to turn a complex input value into a different value which can be used to rapidly extract or store data.


1 Answers

Suppose the domain of your integer values is the range from 0 to n-1, and you want the set of values [x0, x1, ..., xk-1] to map to values from 0 to k-1.

Create an array of n values containing the numbers from 0 to k-1 in roughly equal amounts, for example [a0 = 0, a1 = 1, ..., ak = 0, ..., an = n%k].

Then for each of the k values in the initial set (xi, where i = 0 .. k-1), change the k-th element of this array to i, either by direct assignment or by swapping with a value from elsewhere (taking care not to clobber a value set for a previous element of the initial set).

Then to hash a value y, just fetch the y-th value from this array.


DEMO

Here's a Javascript demo that basically implements the above algorithm, except that instead of pre-filling the array with values from 0 to k-1, it first inserts the hash values for the selected items, then fills the remaining items with the repeating sequence of numbers from 0 to k-1. You will probably get better collision resistance by using a random sequence instead of incrementing values, but I hope you get the picture.

var hash_array;

function generate_hash() {
  var i, j, k;
  var v = document.getElementById;
  var n = document.getElementById("n").value;
  // Create a new hash lookup table
  hash_array = Array(n);
  // Initialize every value to -1
  for (i=0; i<n; i++) hash_array[i] = -1;
  // Map the given values to the first k hash buckets
  var initial_values = document.getElementById("init").value.split(/ +/);
  k = initial_values.length;
  for (i=0; i<k; i++) {
    hash_array[initial_values[i]] = i;
  }
  // Fill the remaining buckets with values from 0 to k-1
  // This could be done by selecting values randomly, but
  // here we're just cycling through the values from 0 to k-1
  for (i=j=0; i<hash_array.length; i++) {
    if (hash_array[i] == -1) {
      hash_array[i] = j;
      j = (j + 1) % k;
    }
  }
  document.getElementById("gen").innerHTML = "Hash lookup table:<br>" + hash_array.join(", ");
}
<h2>Demo</h2>
<p>Creating a hash function that works on integer values  less than <i>n</i>. What is the value of <i>n</i>?<br>
<input type="number" id="n" min="6" max="100" value="20"/></p>
<p>Enter a few different values separated by spaces. These will hash to the first buckets<br/>
<input type="text" size="40" id="init" value="2 3 5 6 9"/></p>
<p id="gen"><button onclick="generate_hash(); return false">Generate hash table</button></p>
like image 179
r3mainer Avatar answered Sep 30 '22 01:09

r3mainer