I've been using hash and RSA for a time(on a very superficial level, for ex: RSA authentication on a SSH connection), and I want to learn more about it.
To begin with, I know that encryption is a two-way process that can be reverted. And hashing is a one-way process that is irreversible.
That last point just doesn't make sense to me, if I use an algorithm to hash "hello", wouldn't the same algorithm, but "reversed"(meaning, it works "backwards"), be able to convert that hash to "hello" again.
EDIT:
Thanks to @GeorgDangl, @klutt, and Pete Kirkham for pointing out that I didn't understand at all the concept of "irreversible math". The examples were really helpful.
It is irreversible in the sense that for each input you have exactly one output, but not the other way around. There are multiple inputs that yields the same output.
For any given input, there's a lot (infinite in fact) different inputs that would yield the same hash. This is easy to realize since the output has a fixed size, but the input doesn't have size restrictions.
To achieve this, irreversible math is used. For instance, it is easy to calculate 10%3
. The answer to that is simply 10%3=1
. But if I give you the equation x%3=1
, what would you do? This equation is true for all x=3*k+1
. Thus, you cannot get the number I started with.
Another example of irreversible math is sine and cosine. For instance, cos(0)=1
, but there are more input values that evaluates to 1. Actually, cos(n*2pi)=1
. There are "inverses" to these functions, but they either gives an answer in a certain range or a multivalued answer. A third example is x²=1
. This is true for both x=1
and x=-1
. However, in this example you get a finite (and also rather small) number of possible answers.
When dealing with encryption, one could say that the private key is used to pick the right solution. You can always quite trivially decrypt an encrypted message, but you will get a huge load of possible answers. The key is used to find the right one, rather than actually decrypting.
Another thing worth mentioning is that a good hash algorithm minimizes for collisions, that is, two inputs generating the same hash. Also, when it comes to cryptography, you want it to be as hard as possible to reverse it. But this also comes with the cost that algorithms are cpu intensive.
A very basic and insecure hash algorithm could look like this pseudocode:
hash = 0
for each byte in input:
hash = hash + byte
Here I assume that hash
is a simple integer variable that wraps around when it comes to it's maximum value. The algorithm is easy to implement, and it's quick. But you don't want to use it if security is important. It's typically very easy to modify a file so that this hash check won't detect it.
Real cryptographic hash algorithms strives to achieve the property that if you change any single bit in the input, every single bit in the output have a 50% chance of flipping. Furthermore, if you flip two bits in the input, the bits flipped in the output will be totally unrelated to which bits would flip if you just changed the bits one by one.
I found a good youtube video on the subject: https://www.youtube.com/watch?v=yoMOAIzBSpY
Trivial example - say that for our irreversible function, we take the number we are input and return the value modulo 7.
hash( 0) => 0
hash( 1) => 1
hash( 2) => 2
hash( 3) => 3
hash( 4) => 4
hash( 5) => 5
hash( 6) => 6
hash( 7) => 0
hash( 8) => 1
hash( 9) => 2
hash(10) => 3
hash(12) => 4
hash(13) => 5
hash(14) => 6
so if the hash value is 6, you don't know if the input was 6, or 14, or any value of 6 + 7 * N where N is an integer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With