Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Compression Algorithms

I was wondering if anyone has a list of data compression algorithms. I know basically nothing about data compression and I was hoping to learn more about different algorithms and see which ones are the newest and have yet to be developed on a lot of ASICs.

I'm hoping to implement a data compression ASIC which is independent of the type of data coming in (audio,video,images,etc.)

If my question is too open ended, please let me know and I'll revise. Thank you

like image 866
Veridian Avatar asked May 09 '13 19:05

Veridian


People also ask

What is the algorithm for data compression?

Some of the most widely known compression algorithms include: RLE. Huffman. LZ77.

What is the best data compression algorithm?

The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.


2 Answers

There are a ton of compression algorithms out there. What you need here is a lossless compression algorithm. A lossless compression algorithm compresses data such that it can be decompressed to achieve exactly what was given before compression. The opposite would be a lossy compression algorithm. Lossy compression can remove data from a file. PNG images use lossless compression while JPEG images can and often do use lossy compression.

Some of the most widely known compression algorithms include:

  • RLE
  • Huffman
  • LZ77

ZIP archives use a combination of Huffman coding and LZ77 to give fast compression and decompression times and reasonably good compression ratios.

LZ77 is pretty much a generalized form of RLE and it will often yield much better results.

Huffman allows the most repeating bytes to represent the least number of bits. Imagine a text file that looked like this:

aaaaaaaabbbbbcccdd 

A typical implementation of Huffman would result in the following map:

Bits Character    0         a   10         b  110         c 1110         d 

So the file would be compressed to this:

00000000 10101010 10110110 11011101 11000000                                        ^^^^^                               Padding bits required 

18 bytes go down to 5. Of course, the table must be included in the file. This algorithm works better with more data :P

Alex Allain has a nice article on the Huffman Compression Algorithm in case the Wiki doesn't suffice.

Feel free to ask for more information. This topic is pretty darn wide.

like image 109
user123 Avatar answered Oct 01 '22 21:10

user123


My paper A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems (permalink here) reviews many compression algorithms and also techniques for using them in modern processors. It reviews both research-grade and commercial-grade compression algorithms/techniques, so you may find one which has not yet been implemented in ASIC.

like image 20
user984260 Avatar answered Oct 01 '22 19:10

user984260