I hear this term sometimes and am wondering what it is used for?
Hashing is a cryptographic process that can be used to validate the authenticity and integrity of various types of input. It is widely used in authentication systems to avoid storing plaintext passwords in databases, but is also used to validate files, documents and other types of data.
Hash values are used to identify and filter duplicate files (i.e. email, attachments, and loose files) from an ESI collection or verify that a forensic image or clone was captured successfully. Each hashing algorithm uses a specific number of bytes to store a “ thumbprint” of the contents.
Hashing is a function that applies to an arbitrary data and produces the data of a fixed size (mostly a very small size). There are many different types of hashes, but if we are talking about image hashing, it is used either to:
Images that look identical to us, can be very different if you will just compare the raw bytes. This can be due to:
Even if you will find an image that will be different just in one byte, if you will apply a hash function to it, the result can be very different (for hashes like MD5, SHA it most probably will be completely different).
So you need a hash function which will create a similar (or even identical) hash for similar images. One of the generic ones is locality sensitive hashing. But we know what kind of problems can be with images, so we can come up with a more specialized kind of hash.
The most well known algorithms are:
By the way, if you use python, all these hashes are already implemented in this library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With