Is there mathematical proof that Huffman coding is the most efficient lossless compression algorithm?

Tags:

My friend told me it existed but I could never find it, not sure if he was lying but I'm very interested as to how the proof works. (Yes, I'm one of those people who found out about Huffman coding from the Silicon Valley TV show, sorry)

277

asked Nov 03 '15 23:11

suchaHassle

Video Answer

2 Answers

It is not the most efficient lossless compression method. Arithmetic coding beats it for a start. Since it is not the most efficient, there is no proof that it is. I believe it is the optimal code when using an integer number of bits per symbol however, perhaps that is the proof your friend was talking about.

answered Sep 20 '22 12:09

mattnewport

The answer is it is, it isn't, and the question is ill-posed. :-)

Here is a high level view. Lossless compression algorithms provide a reversible mapping of possible documents to be compressed, to compressed documents. Documents can be viewed as strings of bits. There are 2^n possible documents with n bits. There are 2^n possible compressed documents with n bits. Therefore the pidgin-hole principle says that for every document that is stored more efficiently, some other possible document must be stored less efficiently.

So how is compression possible? It is possible because while all documents are possible, they are not equally likely. So a good compression algorithm will store likely documents very efficiently, and unlikely ones inefficiently. But then the question is what documents are efficient. The answer to that is, "It depends." And the answer to how good a compression algorithm is will also depend.

Suppose that you take the set of random documents made out of a set of symbols that independently appear with different probabilities. Huffman coding produces the most efficient possible compression algorithm.

Now suppose you take the set of random sentences that are likely to be written in English? Huffman coding is limited to looking at raw letter frequencies. It makes no use of the fact that certain combinations of letters appear very frequently. Other encodings that can use that will now work better.

Now suppose you take the set of documents that could be produced by your camera. This looks nothing like text, and different encoding methods will work better.

So there are cases where Huffman is best. Cases where it isn't. And the question is ill-posed since it depends on, "What documents are likely?"

answered Sep 18 '22 12:09

btilly

Related questions
                            
                                Tarjan's lowest common ancestor algorithm explanation
                            
                                Calculate the function sin()
                            
                                Selecting points such that sum of x coordinates = sum of y coordinates
                            
                                Count the number of occurrences between markers in a python list
                            
                                why we are always using quick sort ? or any specific sorting algorithm?
                            
                                QR decomposition to solve linear systems in CUDA
                            
                                How to detect duplicates among text documents and return the duplicates' similarity?
                            
                                Sum of multiplication of all combination of m element from an array of n elements
                            
                                How to filter a very, very large file
                            
                                Find k most occurring elements in an integer array
                            
                                Removing duplicate strings with limited memory [closed]
                            
                                Location of highest density on a sphere
                            
                                Algorithm to produce all partitions of a list in order
                            
                                Naive Bayes Text Classification Algorithm
                            
                                Algorithm like Bellman-Ford, only for multiple start, single destination?
                            
                                Dijkstra's algorithm on directed acyclic graph with negative edges
                            
                                Implement pow(x, n)
                            
                                Do multiple loops have same complexity as nested loops?
                            
                                Optimal algorithm to count the number of strings a DFA accepts
                            
                                How to use algorithms to fill vector of vectors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there mathematical proof that Huffman coding is the most efficient lossless compression algorithm?

Tags:

algorithm

compression

lossless-compression

huffman-code

lossless