Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Hash values: how to make them efficiently comparable?

I have pairs of hash values like

  1. 128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55
  2. a1288b1c7e2257a90bad9bdfb7690fbb;f23828e312d90cb7fdadd6479236119c
  3. ................................;................................

I would like to make each pair comparable with other, meaning:

128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55d

stays as it is;

in the case of

d603ac0c04b9d08974482ae7fd4cf55d;128ecf542a35ac5270a87dc74091840

4, it should become

128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55d

My main goal is to have a certain function which compares two hash values of one pair and returns a pair with values ordered in it according to some rule. The rule itself does not matter, the only requirement is, it should be extremely fast and should always give the same result given that input is either (unique1,unique2) or (unique2,unique1)

Thx!

One obvious , but inefficient way to do it would be to sum only the numbers contained in each hash value and compare them and put the hash value with the smaller sum as the first element in the pair and the bigger one in the second position.

like image 718
Bob Avatar asked Jul 01 '26 00:07

Bob


2 Answers

Just compare the two strings with the usual string comparison (compareTo) and put the smaller one first. This will guarantee what you want. I expect this to be very cheap, because in practice the hash values will differ already in the first few chars, and the comparison then doesn't need to look at the rest of the strings. Furthermore, accessing and comparing a very low number of bytes (as in your example) is so cheap anyway that you will be able to see a performance impact only if you do this very often compared to the other operations your program does.

If you don't have the values as strings, but as byte arrays or something like this, just implement a simple lexicographical comparison on your own.

like image 73
Philipp Wendler Avatar answered Jul 03 '26 13:07

Philipp Wendler


Edit: I've just read through your question again and main goal. My answer is for comparing two pairs.

If you want to order the two hash values in a pair really go with linearly comparing the two hash and on the first mismatch of characters put the lower value first. Chances are big that values will differ on the first few bytes already.


I don't think you can do it in Java but you may create a function for it in C/ASM then use it from Java through JNI.

Let's see: These are pairs of 32 byte hash values. You know where the pairs are separated at ';' byte 32.

Let be your coupled hash values as hahsA := (hA1;hA2) and hashB := (hB1;hB2)

You take the first pair:

1) get 0-15 and 33-49 bytes to two SSE registers and XOR it

hashA[0-15] := ( hashA1[0-15] XOR hashA2[0-15] ).

2) get 16-31 and 50-65 bytes to SSE registers and XOR it too

hashA[16-31] := ( hashA1[16-31] XOR hashA2[16-31] ).

Now you have two SSE registers filled with the XOR of the first and second 16 bytes of your couple:

XMM0: hashA[0-15]
XMM1: hashA[16-31]

You take the second pair and do the same thing. You have 8 SSE registers so you may not need to get the the value out of the register after 2).

Now you have:

XMM0: hA[0-15]
XMM1: hA[16-31]
XMM2: hB[0-15]
XMM3: hB[16-31]


if( XMM0 == XMM2 AND XMM1 == XMM3 ) then hA and hB are equal.

Of course if your hash codes are grater than 32 bytes, you need to make further steps, but assuming your hashcodes are all the same length, this algorithm takes:

1) load two 16 bytes and xor = 3 instructions 2) 3 instructions

1) and 2) again for second couple: 6 instructions finally 2 instructions for compare and 1 instruction for AND

That sums up for something like 15 instructions for each comparing couples, not measuring function calls. I cannot say it is O(1) because if hash values are greater the number of instructions are growing as well.

If efficiency really matters this is a way to go. But anyways, I don't think you can do it less than O(n) because you have to compare all the characters.

like image 43
illEatYourPuppies Avatar answered Jul 03 '26 14:07

illEatYourPuppies



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!