Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixed point on a compression algorithm widely used nowadays

I was wondering if there is a compression algorithm, in common use today, that contains a fixed point, i.e., an identity file.

To explain, let's call C : byte[] -> byte[] a function that represents the compression algorithm. I want to know if there exists (and what it is, if it is possible to determine in reasonable time) a file f such that

C(f) = f

That is, a file that, when compressed by a suitable, widely-known compression algorithm in common use nowadays, will produce itself as the result.

Do you know of such a phenomenon?

like image 470
Bruno Reis Avatar asked Aug 20 '09 12:08

Bruno Reis


People also ask

What is the best compression algorithm?

The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.

What are compression algorithms used for?

5.4 Summary. Compression algorithms are normally used to reduce the size of a file without removing information. This can increase their entropy and make the files appear more random because all of the possible bytes become more common.

How many compression algorithms are there?

There are two major categories of compression algorithms: lossy and lossless. Lossy compression algorithms involve the reduction of a file's size usually by removing small details that require a large amount of data to store at full fidelity.

What is the best video compression algorithm?

Hi Amir, currently the best video compression algorithm is the High Efficiency Video Coding (HEVC) It provides a compression ratio of atleast 200x-500x (this is from my experiments) based on how much you are ready to compromise on quality.


2 Answers

Yes! This is a variant on the quine problem.

  1. An example that uses gzip: http://groups.google.com/group/comp.compression/browse_thread/thread/c57c322e15c782aa/350d9fb166fdf11f

  2. An example that uses zip/unzip: http://www.steike.com/code/useless/zip-file-quine/

like image 107
ire_and_curses Avatar answered Sep 24 '22 06:09

ire_and_curses


Warning: Rather pedantic answer.

There are many cases where D(f) = f (D being defined as decompression). However, compression is not as precisely defined. For most compression algorithms, different implementations of the compression algorithm will give different output files (of varying sizes). Consider two programs, 1 and 2. For full interoperability, it is necessary that D1(F) must equal D2(F) for all valid F. Similarly, it is necessary that D2(C2(f)) == D2(C1(F)) == D1(C1(F)) == D1(C2(F)), for all valid F. However it is totally unnecessary that C1(F) == C2(F), and this is in fact rarely the case.

So, you are unlikely to, if you actually compress such quines, to end up with the same file, unless you use the same program to do so that was used to generate it (which is unlikely, since such quines are usually hand-crafted, with C(F) never even being tested).

While it is possible (indeed, trivial!) to produce a program for which C(F) == F for some F, most people tend to instead point out as quines the more well-defined case where D(F) == F (since D1(F)==D2(F) for all valid, compatible decompression of the format of F, assuming F is valid).

So, there are likely cases where C(F) == F, but generally this is the wrong question to ask, and you should instead ask for cases where D(F) == F...which other people who answered the question have provided.

like image 35
Brian Avatar answered Sep 22 '22 06:09

Brian