I'm trying to understand what algorithm Cassandra uses to generate murmur3 hashes of composite partition keys. I know I can obtain the value directly from CQL but I want to reproduce the behaviour of Cassandra for any given tuple directly from Java/scala code.
For simple partition keys the following function computes the correct value (at least in many cases, I know by looking at source code that it is not exact):
long l = com.google.common.hash.Hashing.Hashing.murmur3_128().hashString("my-string", Charset.forName("UTF-8")).asLong();
What if I have two columns on partition key ?
The hash of the concatenation of the two strings is not the same.
Thanks for giving me more details about the algorithm. I wrote a sample code in order to share the solution.
byte[] keyBytes;
try(ByteArrayOutputStream bos = new ByteArrayOutputStream(); DataOutputStream out = new DataOutputStream(bos)) {
String[] keys = new String[] {"key1", "key2"};
for(String key : keys) {
byte[] arr = key.getBytes("UTF-8");
out.writeShort(arr.length);
out.write(arr, 0, arr.length);
out.writeByte(0);
}
out.flush();
keyBytes = bos.toByteArray();
}
long hash = Hashing.murmur3_128().hashBytes(keyBytes).asLong();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With