I want to generate a consistent hash of a UUID string such as dcc549d8-bd0c-49c2-bff8-f6fa80fb7857
, preferably a number between 0 to N.
What is the best and fastest way to do it?
Update: I am thinking of using CRC32. Any pros/cons of it?
Consistent hashing optimization The aim is just to ensure each node is responsible for an equal portion of the ring, so that load is evenly distributed. (Another advantage of having multiple hashes for each node is that the hashes can be added to or removed from the ring gradually to avoid sudden spikes of load. )
So it is "consistent". However, MD5 takes as input an arbitrary sequence of bits, and outputs a sequence of 128 bits. In many situations, you want strings. For instance, you want to hash a password, and the password is initially a string.
In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and. is the number of slots.
Hash rings take the leading bits of a document's hash value and use this to determine which node the document should be assigned. This allows any node in a cluster to know what node the data lives on and how to adapt to new assignment methods as your data grows.
What kind of hash would you like? The 'best' choice might not be fastest and will depend on what you're using the hash for.
For md5, you could do:
var crypto = require('crypto');
var md5sum = crypto.createHash('md5');
md5sum.update(uuid);
var b64 = md5sum.digest('base64')
You could then use a base64 library to convert it to a number if that's what you need.
Node crypto stuff, including other hashing algorithms that might be more appropriate for your case (md5 is faster but less secure), is documented here: https://nodejs.org/api/crypto.html
If thinking about the nature of UUID example. I would go in that direction.
const INIT_NUMBER = 271;
function hash(uuid, N) {
const x = uuid.split("-").reduce((a,b) => a ^ Number.parseInt(b, 16), INIT_NUMBER) ;
return arguments.length === 1 ? x : x % N;
}
const a = hash("dcc549d8-bd0c-49c2-bff8-f6fa80fb7857");
const b = hash("dcc549d8-bd0c-49c2-bff8-f6fa80fb7857", 256);
console.log(a, b);
If you're using v4 UUID generation, all digits except for two already contain pseudorandom values, all you need to do is extract them to the form you want.
From the Spec:
4.4. Algorithms for Creating a UUID from Truly Random or
Pseudo-Random Numbers
The version 4 UUID is meant for generating UUIDs from truly-random or
pseudo-random numbers.
The algorithm is as follows:
o Set the two most significant bits (bits 6 and 7) of the
clock_seq_hi_and_reserved to zero and one, respectively.
o Set the four most significant bits (bits 12 through 15) of the
time_hi_and_version field to the 4-bit version number from
Section 4.1.3.
o Set all the other bits to randomly (or pseudo-randomly) chosen
values.
To extract the pseudorandom values, we simply remove the first hex character (which contains the 4 most significant bits) of the third and fourth sections separated by "-" and convert it to the base you want. Because a UUID is always the same length, we can remove the same indexes each time (14 and 19).
Unfortunately, as Javascript only supports up to 32-bit integers, we need to feed Number.parseInt
groups of 8 hex characters (32 bits) separately, then add the modulos to minimize bias.
Therefore, our code would be:
function parseUUID(uuid, N) {
var pseudorandomBytes = uuid.substring(0, 14) + uuid.substring(15, 19) + uuid.substring(20); // Splice out the bytes which contain metadata
pseudorandomBytes = pseudorandomBytes.replace(/-/g, ""); // Remove all dashes
var accumulator = 0; // Accumulate the sums modulo N of every eight hex characters and the remainder
pseudorandomBytes.match(/.{1,8}/g).forEach(function (a) { accumulator = (accumulator + (Number.parseInt(a, 16) % N)) % N; });
return accumulator; // Return the result
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With