say I have a blob of text 5000 characters. I run it through a hashing program and generates a 40 char long hash. now i run another blob of text, 10000 characters. it still generates a hash 40 chars long. that's true for text of any length.
my question is if the hashes are all unique, wouldn't i be able to compress anything into a 40 char string?
Hashing is not unique.
Hashing is a technique to attempt to generate a unique hash for each value fed to it, but it is not guaranteed unique.
Good hashing algorithms will have duplicate hash values much less frequently than bad hash algorithms. Also, hashing is one directional - meaning you can't go from a hash -> original, so it's not meant for compression.
And: A hash doesn't need to be unique. The same input needs to be tranformed into the same hash by the algorithm. You don't use a hash as identifier!
Not all hashes are guaranteed to be unique. The wikipedia entry on the topic is pretty good: http://en.wikipedia.org/wiki/Hash_function
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With