In honor of the Hutter Prize, what are the top algorithms (and a quick description of each) for text compression?
Note: The intent of this question is to get a description of compression algorithms, not of compression programs.
There are three types of models: • static • semiadaptive or semistatic • adaptive. A static model is a fixed model that is known by both the compressor and the decompressor and does not depend on the data that is being compressed.
Text Compression involves changing the representation of a file so that the (binary) compressed output takes less space to store, or less time to transmit, but still the original file can be reconstructed exactly from its compressed representation.
At maximum compression level, ZIPX is the fastest format, followed by RAR, ARC, and 7Z, ZPAQ being the slowest. Using moderate compression settings, RAR and ARC emerge as the fastest formats.
The boundary-pushing compressors combine algorithms for insane results. Common algorithms include:
Maximum Compression is a pretty cool text and general compression benchmark site. Matt Mahoney publishes another benchmark. Mahoney's may be of particular interest because it lists the primary algorithm used per entry.
There's always lzip.
All kidding aside:
DEFLATE
algorithm) still wins.LZMA
algorithm) compresses very well and is available for under the LGPL. Few operating systems ship with built-in support, however.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With