Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash Function For Sequence of Unique Ids (UUID)

I am storing message sequences in the database each sequence can have up to N number of messages. I want to create a hash function which will represent the message sequence and enable to check faster if message sequence exists.

Each message has a case-sensitive alphanumeric universal unique id (UUID). Consider following messages (M1, M2, M3) with ids-

M1 - a3RA0000000e0taBB M2 - a3RA00033000e0taC M3 - a3RA0787600e0taBB

Message sequences can be

Sequence-1 : (M1,M2,M3) Sequence-2 : (M1,M3,M2) Sequence-3 : (M2,M1,M3) Sequence-4 : (M1,M2) Sequence-5 : (M2,M3) ...etc...

Following is the database structure example for storing message sequence

enter image description here

Given the message sequence, we need to check whether that message sequence exists in the database. For example, check if message sequence M1 -> M2 -> M3 i.e. with UIDs (a3RA0000000e0taBB -> a3RA00033000e0taC -> a3RA0787600e0taBB) exists in the database.

Instead of scanning the rows in the table, I want to create a hash function which represents the message sequence with a hash value. Using the hash value lookup in the table supposedly faster.

My simple hash function is- enter image description here

I am wondering what would be an optimal hash function for storing the message sequence hash for faster is exists check.

like image 883
Anwar Shaikh Avatar asked Aug 20 '18 22:08

Anwar Shaikh


People also ask

Is hash of UUID unique?

It depends on the algorithm. A hash, by definition, is 'less' unique than the original.

Is UUID a hash function?

Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.

Does UUID use MD5?

UUIDs versions 3 and 5 differ primarily in that they use different hashing algorithms. UUID v3 uses MD5, and UUID v5 uses SHA-1.

What is unique hash function?

A Hashing function is just like an ordinary function that you will see in any programming language except that it takes some data as input and returns a fixed unique string(a set of numbers & characters) as an output. And that string is what we call a hash.


1 Answers

You don't need a full-blown cryptographic hash, just a fast one, so how about having a look at FastHash: https://github.com/ZilongTan/Coding/tree/master/fast-hash. If you believe 32 or 64 bit hashes are not enough (i.e. produce too many collisions) then you could use the longer MurmurHash: https://en.wikipedia.org/wiki/MurmurHash (actually, the author of FastHash recommends this approach)

There's a list of more algorithms on Wikipedia: https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions

In any case, hashes using bit operations (SHIFT, XOR ...) should be faster than the multiplication in your approach, even on modern machines.

like image 98
memo Avatar answered Oct 09 '22 10:10

memo