Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compression/encryption algorithms output guarantees

My question here regards compression/encryption algorithms in general and to me sounds like a complete noobie one. Now, I understand that "in general" "it all depends", but suppose we're talking algorithms that all have reference implementation/published specs and are overall ever so standard. To be more specific, I'm using .NET implementations of AES-256 and GZip/Deflate

So here goes. Can it be assumed that, given exactly the same input, both types of algorithms will produce exactly the same output.

For example, will output of aes(gzip("hello"), key, initVector)) on .NET be identical to that of on a Mac or Linux?

like image 536
Anton Gogolev Avatar asked Jan 19 '23 04:01

Anton Gogolev


1 Answers

AES is rigourosly defined, so given same input, same algorithm, and same key, you will get the same output.

It cannot be said the same for zip.

The problem is not the standard. There IS a defined standard : Deflate stream is IETF RFC 1950, gzip stream is IETF RFC 1952, so anyone can produce a compatible zip compressor/decoder starting from these definitions.

But zip belong to the large family of LZ compressors, which, by construction, are neither bijective nor injective. Which means, from a single source, there are many many ways to describe the same input which are all valid although different.

An example. Let's say, my input is : ABCABCABC

Valid outputs can be :

  • 9 literals

  • 3 literals followed by one copy of 6 bytes long starting at offset -3

  • 3 literals followed by two copies of 3 bytes long each starting at offset -3

  • 6 literals followed by one copy of 3 bytes long starting at offset -6

  • etc.

All these outputs are valid and describe (regenerate) the same input. Obviously, one of them is more efficient (compress more) than the others. But that's where implementation may differ. Some will be more powerful than others. For example, it is known that kzip and 7zip generate better (more compressed) zip files than gzip. Even gzip has a lot of compression options generating different compressed streams starting from a same input.

Now, if you want to constantly get exactly the same binary output, you need more than "zip" : you need to enforce a precise zip implementation, and a precise compression parameter. Then, you'll be sure that you generate always the same binary.

like image 176
Cyan Avatar answered Jan 30 '23 12:01

Cyan