Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LZ4 compression algorithm explanation

Description from Wikipedia:

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one byte token that is broken into two 4 bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255. After the string of literals comes the token and any extra bytes needed to indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence

I didn't understand that at all! Does anyone have an easy way to understand example? For example, in the above explanation what is a literal byte and what is a match? How can we have a decoded output buffer when we're just beginning to compress? Length of what?

The explanation at here was also impenetrable for me.

A simple example would be nice unless you have a better way of explaining it.

like image 704
ade Avatar asked Jan 15 '14 13:01

ade


People also ask

How does LZ4 compression work?

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one-byte token that is broken into two 4-bit fields. The first field represents the number of literal bytes that are to be copied to the output.

What is compression ratio LZ4?

LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core (>0.15 Bytes/cycle). It features an extremely fast decoder, with speed in multiple GB/s per core (~1 Byte/cycle).

How do compression algorithms work?

Compression algorithms reduce the number of bytes required to represent data and the amount of memory required to store images. Compression allows a larger number of images to be stored on a given medium and increases the amount of data that can be sent over the internet.

How does LZO compression work?

LZO compresses a block of data into matches (a sliding dictionary) and runs of non-matching literals to produce good results on highly redundant data and deals acceptably with non-compressible data, only expanding incompressible data by a maximum of 1/64 of the original size when measured over a block size of at least ...


1 Answers

First, read about LZ77, the core approach being used. The text is a description of a particular way to code a series of literals and string matches in the preceding data.

A match is when the next bytes in the uncompressed data occur in the previously decompressed data. So instead of sending those bytes directly, a length and an offset is sent. Then you go offset bytes backwards and copy length bytes to the output.

Yes, you can't have a match at the beginning of the stream. You have to start with literals. (Unless there is a preset dictionary, which is another topic.)

like image 125
Mark Adler Avatar answered Oct 04 '22 04:10

Mark Adler