Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Theory: Compression algorithm that makes some files smaller but none bigger?

I came across this question;

"A lossless compression algorithm claims to guarantee to make some files smaller and no files larger.
Is this;

a) Impossible

b) Possible but may run for an indeterminate amount of time,

c) Possible for compression factor 2 or less,

d) Possible for any compression factor?"

I'm leaning towards (a), but couldn't give a solid explanation as to why. (I'll list the thoughts a friend and I came up with as a possible answer)

like image 336
RJFalconer Avatar asked Oct 03 '09 11:10

RJFalconer


2 Answers

By the pigeon-hole principle, given a string of 10 bits you have 1024 possible inputs, and need to map to 9 bits or fewer, so there are < 1024 outputs.

This guarantees either the algorithm has collisions (lossy compression) or at some point choses to return the unmodified input as output.

In the latter case, you cannot determine how to decompress an arbitrary string of bits. (It could be an unmodified input, or a compressed output from a larger bit string).

-> Impossible.

like image 170
RJFalconer Avatar answered Jan 03 '23 15:01

RJFalconer


Just a slight clarification of RJFalconer's post...

You only have to have some files becoming smaller, so the claim that a string of 10 bits has to map to 9 bits or fewer isn't quite right. In particular, if someone proposed such a compression mechanism it could map all strings of 10 bits or fewer to exactly the same output (i.e. an identity transformation).

However, we are told that there is at least one file which does become smaller. Without loss of generality, consider that to start with x bits and end up as y bits, where y is strictly less than x.

Now consider the domain of "files with y bits or fewer", which has 2y+1-1 bit strings (including the empty one). In order for none of those to result in a bigger file, each has to map to a bit string in the same domain, i.e. 2y+1-1 compressed files. However, we already know that the initial string of length x bits compresses to one of those values - leaving only 2y+1-2 possible values.

At this point the pigeon hole principle comes in - you clearly can't map 2y+1-1 inputs to 2y+1-2 outputs without repeating an output, which violates the reversibility of compression.

like image 38
Jon Skeet Avatar answered Jan 03 '23 15:01

Jon Skeet