I was wondering if there is a compression algorithm, in common use today, that contains a fixed point, i.e., an identity file.
To explain, let's call C : byte[] -> byte[]
a function that represents the compression algorithm. I want to know if there exists (and what it is, if it is possible to determine in reasonable time) a file f
such that
C(f) = f
That is, a file that, when compressed by a suitable, widely-known compression algorithm in common use nowadays, will produce itself as the result.
Do you know of such a phenomenon?
The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.
5.4 Summary. Compression algorithms are normally used to reduce the size of a file without removing information. This can increase their entropy and make the files appear more random because all of the possible bytes become more common.
There are two major categories of compression algorithms: lossy and lossless. Lossy compression algorithms involve the reduction of a file's size usually by removing small details that require a large amount of data to store at full fidelity.
Hi Amir, currently the best video compression algorithm is the High Efficiency Video Coding (HEVC) It provides a compression ratio of atleast 200x-500x (this is from my experiments) based on how much you are ready to compromise on quality.
Yes! This is a variant on the quine problem.
An example that uses gzip: http://groups.google.com/group/comp.compression/browse_thread/thread/c57c322e15c782aa/350d9fb166fdf11f
An example that uses zip/unzip: http://www.steike.com/code/useless/zip-file-quine/
Warning: Rather pedantic answer.
There are many cases where D(f) = f (D being defined as decompression). However, compression is not as precisely defined. For most compression algorithms, different implementations of the compression algorithm will give different output files (of varying sizes). Consider two programs, 1 and 2. For full interoperability, it is necessary that D1(F) must equal D2(F) for all valid F. Similarly, it is necessary that D2(C2(f)) == D2(C1(F)) == D1(C1(F)) == D1(C2(F)), for all valid F. However it is totally unnecessary that C1(F) == C2(F), and this is in fact rarely the case.
So, you are unlikely to, if you actually compress such quines, to end up with the same file, unless you use the same program to do so that was used to generate it (which is unlikely, since such quines are usually hand-crafted, with C(F) never even being tested).
While it is possible (indeed, trivial!) to produce a program for which C(F) == F for some F, most people tend to instead point out as quines the more well-defined case where D(F) == F (since D1(F)==D2(F) for all valid, compatible decompression of the format of F, assuming F is valid).
So, there are likely cases where C(F) == F, but generally this is the wrong question to ask, and you should instead ask for cases where D(F) == F...which other people who answered the question have provided.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With