I feel I have a pretty good understanding of hash functions and the contracts they entail.
SHA1 on Input X will ALWAYS produce the same output. You could use a Python library, a Java library, or pen and paper. It's a function, it is deterministic. My SHA1 does the same as yours and Alice's and Bob's.
As I understand it, AES is also a function. You put in some values, it spits out the ciphertext.
Why, then, could there ever be fears that Truecrypt (for instance) is "broken"? They're not saying AES is broken, they're saying the program that implements it may be. AES is, in theory, solid. So why can't you just run a file through Truecrypt, run it through a "reference AES" function, and verify that the results are the same? I know it absolutely does not work like that, but I don't know why.
What makes AES different from SHA1 in this way? Why might Truecrypt AES spit out a different file than Schneier-Ifier* AES, when they were both given all the same inputs?
In the end, my question boils down to:
My_SHA1(X) == Bobs_SHA1(X) == ...etc
But TrueCrypt_AES(X) != HyperCrypt_AES(X) != VeraCrypt_AES(X) etc. Why is that? Do all those programs wrap AES, but have different ways of determining stuff like an initialization vector or something?
*this would be the name of my file encryption program if I ever wrote one
In the SHA-1 example you give, there is only a single input to the function, and any correct SHA-1 implementation should produce the same output as any other when provided the same input data.
For AES however things are a bit tricker, and since you don't specify what you mean exactly by "AES", this itself seems likely to be the source of the perceived differences between implementations.
Firstly, "AES" isn't a single algorithm, but a family of algorithms that take different key sizes (128, 192 or 256 bits). AES is also a block cipher, it takes a single block of 128 bits/16 bytes of plaintext input, and encrypts this using the key to produce a single 16 byte block of output.
Of course in practice we often want to encrypt more than 16 bytes of data at once, so we must find a way to repeatedly apply the AES algorithm in order to encrypt all the data. Naively we could split it into 16 byte chunks and encrypt each one in turn, but this mode (described as Electronic Codebook or ECB) turns out to be horribly insecure. Instead, various other more secure modes are usually used, and most of these require an Initialization Vector (IV) which helps to ensure that encrypting the same data with the same key doesn't result in the same ciphertext (which would otherwise leak information).
Most of these modes still operate on fixed-sized blocks of data, but again we often want to encrypt data that isn't a multiple of the block size, so we have to use some form of padding, and again there are various different possibilities for how we pad a message to a length that is a multiple of the block size.
So to put all of this together, two different implementations of "AES" should produce the same output if all of the following are identical:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With