I'd like to compare some large objects representing trees and cache something to avoid comparing each time the new object with one already existing...
The question is what would be the best something ? (a compromise between performance and collisions...).
On the one hand, I have a regular hashCode function based on the value of various fields (following the chapter 3 of effective Java. But I'm not able to evaluate the potential collisions entailed by such an approach.
On the other hand, I have the MessageDigest approach from the standard java distribution with SHA-1 algorithm. I presume it's not going to be efficient but I may have less collision. Am I right ? Is it a correct solution in my context or am I completely wrong ?
The thing is that I don't know what would be the size of the objects. Please also note that the value computed is not going to be used in a HashTable.
thx...
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits.
A proof-of-concept attack has been pioneered that “fully and practically” breaks the Secure Hash Algorithm 1 (SHA-1) code-signing encryption, used by legacy computers to sign the certificates that authenticate software downloads and prevent man-in-the-middle tampering.
The message digest length in MD5 can be up to 128 bits. The message digest length for SHA1 can be up to 160 bits. MD5 is faster than SHA.
The hash size for the SHA1 algorithm is 160 bits.
Because of the birthday problem, the chance of a collision depends on how many items you are working with.
The 160-bit space of SHA-1 is so large that I doubt you could ever have enough items to see a collision.
The 32-bit space of hashCode()
should not have a significant number of collisions until you have over 50,000 items. However, this depends on using a good hash algorithm.
In order to apply a cryptographic digest like SHA-1, you'll need to convert your graph to a string of bytes, which is likely to be computationally expensive, and could be complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With