LZ4 compression algorithm explanation

Tags:

Description from Wikipedia:

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one byte token that is broken into two 4 bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255. After the string of literals comes the token and any extra bytes needed to indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence

I didn't understand that at all! Does anyone have an easy way to understand example? For example, in the above explanation what is a literal byte and what is a match? How can we have a decoded output buffer when we're just beginning to compress? Length of what?

The explanation at here was also impenetrable for me.

A simple example would be nice unless you have a better way of explaining it.

704

asked Jan 15 '14 13:01

ade

1 Answers

First, read about LZ77, the core approach being used. The text is a description of a particular way to code a series of literals and string matches in the preceding data.

A match is when the next bytes in the uncompressed data occur in the previously decompressed data. So instead of sending those bytes directly, a length and an offset is sent. Then you go offset bytes backwards and copy length bytes to the output.

Yes, you can't have a match at the beginning of the stream. You have to start with literals. (Unless there is a preset dictionary, which is another topic.)

125

answered Oct 04 '22 04:10

Mark Adler

Related questions
                            
                                Algorithm to find clusters (min x pts. within y distance of cluster center) of geographical points
                            
                                Searching an array for integers in a specific range
                            
                                How binary search is used in database indexing
                            
                                spline surface interpolation
                            
                                What does lowlink mean in Tarjan's SCC algorithm?
                            
                                how to sort geographical data for quick search
                            
                                Local minimum in unsorted array
                            
                                Point location in tetrahedron meshes
                            
                                Processing of mongolian names
                            
                                What is the difference between an algorithm and a programming model? [closed]
                            
                                Find whether given sum exists over a path in a BST
                            
                                Fast string comparison of strings with exact length in C
                            
                                A self selecting team
                            
                                Weekly group assignment algorithm
                            
                                How to retrieve a random word of a given length from a Trie
                            
                                Robust Line Extraction from Image
                            
                                Sort a series of n numbers between [0,2k], Where between each pair exists: |Ai-Aj|>=k/n
                            
                                Given an array of Strings, return true if each string could be connected to other
                            
                                Design of a high-performance sorted data structure read by many threads and written by few
                            
                                Positioning Devices (Intersecting Circles)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LZ4 compression algorithm explanation

Tags:

algorithm

compression

lz4

ade

People also ask

1 Answers

Mark Adler

Recent Activity

Donate For Us