Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huffman code with lookup table

Tags:

c++

I have read an article on Internet and know that the natural way of decoding by traversing from root but I want to do it faster with a lookup table.

After reading, I still cannot get the points.

ex:

input:"abcdaabaaabaaa"
code   data
0      a
10     b
110    c
111    d

The article says that due to variable length, it determine the length by taking a bit of string of max code length and use it as index.

output:"010110111001000010000"
Index    Index(binary)    Code    Bits required
0        000              a       1
1        001              a       1
2        010              a       1
3        011              a       1
4        100              b       2
5        101              b       2
6        110              c       3
7        111              d       3

My questions are:

  1. What does it means due to variable length, it determine the length by taking a bit of string of max code length? How to determine the length?

  2. How to generate the lookup table and how to use it? What is the algorithm behind?

like image 966
Jason Avatar asked Dec 10 '12 16:12

Jason


People also ask

Which algorithm is best for Huffman coding?

Explanation: Greedy algorithm is the best approach for solving the Huffman codes problem since it greedily searches for an optimal solution.

When should you not use Huffman coding?

The worst case for Huffman coding can happen when the probability of a symbol exceeds 2^(−1) = 0.5, making the upper limit of inefficiency unbounded.

What is a Huffman table?

In a JPEG bit-stream, a Huffman table is specified by two lists, BITS and HUFFVAL. BITS is a 16-byte array contained in the codeword stream where byte n simply gives the number of codewords of length n that are present in the Huffman table. HUFFVAL is a list of symbol values in order of increasing codeword length.


1 Answers

For your example, the maximum code length is 3 bits. So you take the first 3 bits from your stream (010) and use that to index the table. This gives code, 'a' and bits = 1. You consume 1 bit from your input stream, output the code, and carry on. On the second go around you will get (101), which indexes as 'b' and 2 bits, etc.

To build the table, make it as large as 1 << max_code_length, and fill in details as if you are decoding the index as a huffman code. If you look at your example all the indices which begin '0' are a, indices beginning '10' are b, and so on.

like image 51
JasonD Avatar answered Oct 13 '22 16:10

JasonD