Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashing, MurmurHash

Tags:

algorithm

I used Murmur hash to hash around 800 000 string values, and this cause many conflicts (collision), that around 17 collision (different strings give the same hash value), is this normal, any one know the quality of murmur hash function

like image 551
Bashar Haddad Avatar asked Mar 07 '11 06:03

Bashar Haddad


People also ask

How MurmurHash works?

MurmurHash can be return negtive value, original value bit AND against 0x7fffffff。 that is value & 0x7fffffff . When the input is positive, the original value is returned. When the input number is negative, the returned positive value is the original value bit AND against 0x7fffffff which is not its absolutely value.

Which hashing is best?

Google recommends using stronger hashing algorithms such as SHA-256 and SHA-3. Other options commonly used in practice are bcrypt , scrypt , among many others that you can find in this list of cryptographic algorithms.

What is a non cryptographic hash?

Non cryptographic hash functions just try to avoid collisions for non malicious input. Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible. In exchange for weaker guarantees they are typically (much) faster.

Is MurmurHash deterministic?

MurmurHash is deterministic, so a user ID will always map to the same variation as long as the experiment conditions don't change. This also means that any SDK will always output the same variation, as long as user IDs and user attributes are consistently shared between systems.


2 Answers

Check this excellent answer on programmers.stackexhange.com which compares various hash algorithms including Mumurhash2 (but not Mumurhash3) for speed, collisions, and randomness.

like image 199
JBentley Avatar answered Sep 27 '22 23:09

JBentley


This comparison of hashing functions seems to indicate that Murmurhash generates roughly the same number of collisions as alternate hashes over a wide range of input data.

like image 25
AShelly Avatar answered Sep 27 '22 22:09

AShelly