I have pairs of hash values like
- 128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55
- a1288b1c7e2257a90bad9bdfb7690fbb;f23828e312d90cb7fdadd6479236119c
- ................................;................................
I would like to make each pair comparable with other, meaning:
128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55d
stays as it is;
in the case of
d603ac0c04b9d08974482ae7fd4cf55d;128ecf542a35ac5270a87dc74091840
4, it should become
128ecf542a35ac5270a87dc740918404;d603ac0c04b9d08974482ae7fd4cf55d
My main goal is to have a certain function which compares two hash values of one pair and returns a pair with values ordered in it according to some rule. The rule itself does not matter, the only requirement is, it should be extremely fast and should always give the same result given that input is either (unique1,unique2) or (unique2,unique1)
Thx!
One obvious , but inefficient way to do it would be to sum only the numbers contained in each hash value and compare them and put the hash value with the smaller sum as the first element in the pair and the bigger one in the second position.
Just compare the two strings with the usual string comparison (compareTo) and put the smaller one first. This will guarantee what you want. I expect this to be very cheap, because in practice the hash values will differ already in the first few chars, and the comparison then doesn't need to look at the rest of the strings. Furthermore, accessing and comparing a very low number of bytes (as in your example) is so cheap anyway that you will be able to see a performance impact only if you do this very often compared to the other operations your program does.
If you don't have the values as strings, but as byte arrays or something like this, just implement a simple lexicographical comparison on your own.
Edit: I've just read through your question again and main goal. My answer is for comparing two pairs.
If you want to order the two hash values in a pair really go with linearly comparing the two hash and on the first mismatch of characters put the lower value first. Chances are big that values will differ on the first few bytes already.
I don't think you can do it in Java but you may create a function for it in C/ASM then use it from Java through JNI.
Let's see: These are pairs of 32 byte hash values. You know where the pairs are separated at ';' byte 32.
Let be your coupled hash values as hahsA := (hA1;hA2) and hashB := (hB1;hB2)
You take the first pair:
1) get 0-15 and 33-49 bytes to two SSE registers and XOR it
hashA[0-15] := ( hashA1[0-15] XOR hashA2[0-15] ).
2) get 16-31 and 50-65 bytes to SSE registers and XOR it too
hashA[16-31] := ( hashA1[16-31] XOR hashA2[16-31] ).
Now you have two SSE registers filled with the XOR of the first and second 16 bytes of your couple:
XMM0: hashA[0-15]
XMM1: hashA[16-31]
You take the second pair and do the same thing. You have 8 SSE registers so you may not need to get the the value out of the register after 2).
Now you have:
XMM0: hA[0-15]
XMM1: hA[16-31]
XMM2: hB[0-15]
XMM3: hB[16-31]
if( XMM0 == XMM2 AND XMM1 == XMM3 ) then hA and hB are equal.
Of course if your hash codes are grater than 32 bytes, you need to make further steps, but assuming your hashcodes are all the same length, this algorithm takes:
1) load two 16 bytes and xor = 3 instructions 2) 3 instructions
1) and 2) again for second couple: 6 instructions finally 2 instructions for compare and 1 instruction for AND
That sums up for something like 15 instructions for each comparing couples, not measuring function calls. I cannot say it is O(1) because if hash values are greater the number of instructions are growing as well.
If efficiency really matters this is a way to go. But anyways, I don't think you can do it less than O(n) because you have to compare all the characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With