Say I have a set of integers ranging between 1-100. I will only have 5 of these integers drawn out of a hat. I want to then take those 5 integers and place them into 5 buckets guaranteed unique (without having to deduplicate or anything using something like quadratic probing). Wondering how to do that.
For example, say I have these numbers (random from 1-100):
1 5 20 50 100
I then want to take those numbers and place them into these 5 buckets:
a b c d e
Using some hash function to accomplish it. For example, perhaps like this:
hash(1) -> b
hash(5) -> a
hash(20) -> e
hash(50) -> d
hash(100) -> c
Wondering how to write the hash function so that it takes a number x
from a domain of numbers D
and a set of numbers D(X)
from that domain, and outputs 1 bucket b
from the set of buckets B
.
H : D(X) -> B
Next time around I might have 6 numbers between 1 and 1,000, going into 6 buckets. So then I would need a new hash function that works using those constraints (6 numbers, 6 buckets, range 1-1,000).
The goal is as few steps as possible.
Note: The hash function for this example won't take integers in a domain larger than 10,000 lets say, as well as the size of the set of integers limited to some small number too like 1,000.
Basically I am trying to get this to happen:
// var domain = [1, 2, ..., 100]
// var set = [1, 5, 20, 50, 100]
// var buckets = [1, 2, 3, 4, 5]
hash(1) // 2
hash(5) // 1
hash(20) // 5
hash(50) // 4
hash(100) // 3
function hash(integer) {
if (integer == 1) return 2
if (integer == 5) return 1
if (integer == 20) return 5
if (integer == 50) return 4
if (integer == 100) return 3
}
But I don't know how to construct that hash function dynamically.
One solution (in JavaScript) would be to just create a map like this:
var map = {
1: 2,
5: 1,
20: 5,
50: 4,
100: 3
}
But that's sort of cheating because the object in JavaScript is implemented as a hashtable underneath (or something like that). So I am looking for how to do this at a low level, just using basically what assembly gives you.
Pretty much, I want to do this:
1
5 |
| | 20
| | 50 |
| | 100 | |
[ slot1, slot2, slot3, slot4, slot5 ]
Where 1
is somehow "hashed" to go into that slot2
in an array of size 5 (that slot is arbitrary for this example), etc.
Hash buckets are used to apportion data items for sorting or lookup purposes. The aim of this work is to weaken the linked lists so that searching for a specific item can be accessed within a shorter timeframe. A hash table that uses buckets is actually a combination of an array and a linked list.
The number of buckets in a hash structure will almost always be on the order of the number of items in the hash structure. The phrase "on the order of" is intentionally imprecise. That means you could have twice as many buckets as items. Or two times as many items as buckets.
A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.
A bucket is simply a fast-access location (like an array index) that is the the result of the hash function. The idea with hashing is to turn a complex input value into a different value which can be used to rapidly extract or store data.
Suppose the domain of your integer values is the range from 0 to n-1, and you want the set of values [x0, x1, ..., xk-1] to map to values from 0 to k-1.
Create an array of n values containing the numbers from 0 to k-1 in roughly equal amounts, for example [a0 = 0, a1 = 1, ..., ak = 0, ..., an = n%k].
Then for each of the k values in the initial set (xi, where i = 0 .. k-1), change the k-th element of this array to i, either by direct assignment or by swapping with a value from elsewhere (taking care not to clobber a value set for a previous element of the initial set).
Then to hash a value y, just fetch the y-th value from this array.
Here's a Javascript demo that basically implements the above algorithm, except that instead of pre-filling the array with values from 0 to k-1, it first inserts the hash values for the selected items, then fills the remaining items with the repeating sequence of numbers from 0 to k-1. You will probably get better collision resistance by using a random sequence instead of incrementing values, but I hope you get the picture.
var hash_array;
function generate_hash() {
var i, j, k;
var v = document.getElementById;
var n = document.getElementById("n").value;
// Create a new hash lookup table
hash_array = Array(n);
// Initialize every value to -1
for (i=0; i<n; i++) hash_array[i] = -1;
// Map the given values to the first k hash buckets
var initial_values = document.getElementById("init").value.split(/ +/);
k = initial_values.length;
for (i=0; i<k; i++) {
hash_array[initial_values[i]] = i;
}
// Fill the remaining buckets with values from 0 to k-1
// This could be done by selecting values randomly, but
// here we're just cycling through the values from 0 to k-1
for (i=j=0; i<hash_array.length; i++) {
if (hash_array[i] == -1) {
hash_array[i] = j;
j = (j + 1) % k;
}
}
document.getElementById("gen").innerHTML = "Hash lookup table:<br>" + hash_array.join(", ");
}
<h2>Demo</h2>
<p>Creating a hash function that works on integer values less than <i>n</i>. What is the value of <i>n</i>?<br>
<input type="number" id="n" min="6" max="100" value="20"/></p>
<p>Enter a few different values separated by spaces. These will hash to the first buckets<br/>
<input type="text" size="40" id="init" value="2 3 5 6 9"/></p>
<p id="gen"><button onclick="generate_hash(); return false">Generate hash table</button></p>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With