Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need to choose a compression algorithm

I need to choose a compression algorithm to compress some data. I don't know the type of data I'll be compressing in advance (think of it as kinda like the WinRAR program).

I've heard of the following algorithms but I don't know which one I should use. Can anyone post a short list of pros and cons? For my application the first priority is decompression speed; the second priority is space saved. Compression (not decompression) speed is irrelevant.

  • Deflate
  • Implode
  • Plain Huffman
  • bzip2
  • lzma
like image 673
chiz Avatar asked Mar 07 '10 18:03

chiz


1 Answers

I ran a few benchmarks compressing a .tar that contained a mix of high entropy data and text. These are the results:

Name  - Compression rate* - Decompression Time
7zip  - 87.8%             - 0.703s
bzip2 - 80.3%             - 1.661s
gzip  - 72.9%             - 0.347s
lzo   - 70.0%             - 0.111s

*Higher is better

From this I came to the conclusion that the compression rate of an algorithm depends on its name; the first in alphabetical order will be the one with the best compression rate, and so on.

Therefore I decided to rename lzo to 1lzo. Now I have the best algorithm ever.


EDIT: worth noting that of all of them unfortunately lzo is the only one with a very restrictive license (GPL) :(

like image 107
Thomas Bonini Avatar answered Oct 20 '22 05:10

Thomas Bonini