Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I safely convert an MD5 hash into buckets in Java/Scala?

Tags:

java

hash

md5

scala

I would like to hash IDs into buckets such that

  1. There is no bias towards a particular bucket
  2. The same ID should always be assigned to the same bucket
  3. IDs should be distributed across all buckets independently
  4. Buckets should be (almost) equal in size

My strategy is to get an MD5 hash of the ID, convert it into a number and then mod it into a bucket.

val hash: Array[Byte] = MessageDigest.getInstance("MD5").digest("Hello")
val number: java.math.BigInteger = new BigInteger(hash)
val bucket = number.mod(new BigInteger("1000"))

Does this approach maintain the nice properties that MD5 provides? Would this achieve the goals above?

like image 895
user1170883 Avatar asked Nov 10 '22 03:11

user1170883


1 Answers

Your approach is sound (if slow), and maintains all the good properties of MD5 except collision resistance.

A lack of collision resistance is rarely a concern in a bucket selection algorithm though. Exploitation requires the system 1. to bucket millions of IDs provided by an untrusted party and 2. to depend on roughly uniform distribution for reliability and/or correctness.

like image 108
that other guy Avatar answered Nov 14 '22 23:11

that other guy