This is a theoretical question, so expect that many details here are not computable in practice or even in theory.
Let's say I have a string s
that I want to compress. The result should be a self-extracting binary (can be x86 assembler, but it can also be some other hypothetical Turing-complete low level language) which outputs s
.
Now, we can easily iterate through all possible such binaries and programs, ordered by size. Let B_s
be the sub-list of these binaries who output s
(of course B_s
is uncomputable).
As every set of positive integers must have a minimum, there must be a smallest program b_min_s
in B_s
.
For what languages (i.e. set of strings) do we know something about the size of b_min_s
? Maybe only an estimation. (I can construct some trivial examples where I can always even calculate B_s
and also b_min_s
, but I am interested in more interesting languages.)
This is Kolmogorov complexity, and you are correct that it's not computable. If it were, you could create a paradoxical program of length n that printed a string with Kolmogorov complexity m > n.
Clearly, you can bound b_min_s
for given inputs. However, as far as I know most of the efforts to do so have been existence proofs. For instance, there is an ongoing competition to compress English Wikipedia.
Claude Shannon estimated the information density of the English language to be somewhere between 0.6 and 1.3 bits per character in his 1951 paper Prediction and Entropy of Printed English (PDF, 1.6 MB. Bell Sys. Tech. J (3) p. 50-64).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With