I am storing message sequences in the database each sequence can have up to N
number of messages. I want to create a hash function which will represent the message sequence and enable to check faster if message sequence exists.
Each message has a case-sensitive alphanumeric universal unique id (UUID).
Consider following messages (M1, M2, M3)
with ids-
M1 - a3RA0000000e0taBB
M2 - a3RA00033000e0taC
M3 - a3RA0787600e0taBB
Message sequences can be
Sequence-1 : (M1,M2,M3)
Sequence-2 : (M1,M3,M2)
Sequence-3 : (M2,M1,M3)
Sequence-4 : (M1,M2)
Sequence-5 : (M2,M3)
...etc...
Following is the database structure example for storing message sequence
Given the message sequence, we need to check whether that message sequence exists in the database. For example, check if message sequence M1 -> M2 -> M3
i.e. with UIDs (a3RA0000000e0taBB -> a3RA00033000e0taC -> a3RA0787600e0taBB)
exists in the database.
Instead of scanning the rows in the table, I want to create a hash function which represents the message sequence with a hash value. Using the hash value lookup in the table supposedly faster.
My simple hash function is-
I am wondering what would be an optimal hash function for storing the message sequence hash for faster is exists check.
It depends on the algorithm. A hash, by definition, is 'less' unique than the original.
Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.
UUIDs versions 3 and 5 differ primarily in that they use different hashing algorithms. UUID v3 uses MD5, and UUID v5 uses SHA-1.
A Hashing function is just like an ordinary function that you will see in any programming language except that it takes some data as input and returns a fixed unique string(a set of numbers & characters) as an output. And that string is what we call a hash.
You don't need a full-blown cryptographic hash, just a fast one, so how about having a look at FastHash: https://github.com/ZilongTan/Coding/tree/master/fast-hash. If you believe 32 or 64 bit hashes are not enough (i.e. produce too many collisions) then you could use the longer MurmurHash: https://en.wikipedia.org/wiki/MurmurHash (actually, the author of FastHash recommends this approach)
There's a list of more algorithms on Wikipedia: https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions
In any case, hashes using bit operations (SHIFT, XOR ...) should be faster than the multiplication in your approach, even on modern machines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With