I used Murmur hash to hash around 800 000 string values, and this cause many conflicts (collision), that around 17 collision (different strings give the same hash value), is this normal, any one know the quality of murmur hash function
MurmurHash can be return negtive value, original value bit AND against 0x7fffffff。 that is value & 0x7fffffff . When the input is positive, the original value is returned. When the input number is negative, the returned positive value is the original value bit AND against 0x7fffffff which is not its absolutely value.
Google recommends using stronger hashing algorithms such as SHA-256 and SHA-3. Other options commonly used in practice are bcrypt , scrypt , among many others that you can find in this list of cryptographic algorithms.
Non cryptographic hash functions just try to avoid collisions for non malicious input. Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible. In exchange for weaker guarantees they are typically (much) faster.
MurmurHash is deterministic, so a user ID will always map to the same variation as long as the experiment conditions don't change. This also means that any SDK will always output the same variation, as long as user IDs and user attributes are consistently shared between systems.
Check this excellent answer on programmers.stackexhange.com which compares various hash algorithms including Mumurhash2 (but not Mumurhash3) for speed, collisions, and randomness.
This comparison of hashing functions seems to indicate that Murmurhash generates roughly the same number of collisions as alternate hashes over a wide range of input data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With