Theory: Compression algorithm that makes some files smaller but none bigger?

Question

I came across this question;

"A lossless compression algorithm claims to guarantee to make some files smaller and no files larger.
Is this;

a) Impossible

b) Possible but may run for an indeterminate amount of time,

c) Possible for compression factor 2 or less,

d) Possible for any compression factor?"

I'm leaning towards (a), but couldn't give a solid explanation as to why. (I'll list the thoughts a friend and I came up with as a possible answer)

RJFalconer · Accepted Answer

By the pigeon-hole principle, given a string of 10 bits you have 1024 possible inputs, and need to map to 9 bits or fewer, so there are < 1024 outputs.

This guarantees either the algorithm has collisions (lossy compression) or at some point choses to return the unmodified input as output.

In the latter case, you cannot determine how to decompress an arbitrary string of bits. (It could be an unmodified input, or a compressed output from a larger bit string).

-> Impossible.

Jon Skeet · Answer

Just a slight clarification of RJFalconer's post...

You only have to have some files becoming smaller, so the claim that a string of 10 bits has to map to 9 bits or fewer isn't quite right. In particular, if someone proposed such a compression mechanism it could map all strings of 10 bits or fewer to exactly the same output (i.e. an identity transformation).

However, we are told that there is at least one file which does become smaller. Without loss of generality, consider that to start with x bits and end up as y bits, where y is strictly less than x.

Now consider the domain of "files with y bits or fewer", which has 2^y+1-1 bit strings (including the empty one). In order for none of those to result in a bigger file, each has to map to a bit string in the same domain, i.e. 2^y+1-1 compressed files. However, we already know that the initial string of length x bits compresses to one of those values - leaving only 2^y+1-2 possible values.

At this point the pigeon hole principle comes in - you clearly can't map 2^y+1-1 inputs to 2^y+1-2 outputs without repeating an output, which violates the reversibility of compression.

Theory: Compression algorithm that makes some files smaller but none bigger?

Tags:

compression

theory

information-theory

RJFalconer

2 Answers

RJFalconer

Jon Skeet

Recent Activity

Donate For Us

Theory: Compression algorithm that makes some files smaller but none bigger?

Tags:

compression

theory

information-theory

RJFalconer

2 Answers

RJFalconer

Jon Skeet

Related questions

Recent Activity

Donate For Us