Huffman trees for non-binary alphabets?

Question

Is there an easy generalization of Huffman coding trees for situations where the resulting alphabet is not binary? For instance, if I wanted to compress some text by writing it out in ternary, I could still build up a prefix-free coding system for each character I as writing out. Would the straightforward generalization of the Huffman construction (using a k-ary tree rather than a binary tree) still work correctly and efficiently? Or does this construction lead to a highly inefficient coding scheme?

hobbs · Accepted Answer

The algorithm still works and it's still simple — in fact Wikipedia has a brief reference to n-ary Huffman coding citing the original Huffman paper as a source.

It does occur to me, though, that just as Huffman is slightly suboptimal because it allocates an integer number of bits to each symbol (unlike e.g. Arithmetic coding), ternary Huffman should be a little bit more suboptimal because it has to allocate an integer number of trits. Not a show-stopper, especially for only 3, but it does indicate that n-ary Huffman will fall further behind other coding algorithms as you increase n.

Null Set · Answer

As an empirical test, I constructed binary and trinary Huffman trees for the distribution of Scrabble tiles.

The entropy of the distribution shows you can't get better than 4.37 bits per letter.

The binary Huffman tree uses on average 4.41 bits per letter.

The trinary Huffman tree uses on average 2.81 trits per letter, which has the same information density as 4.45 bits per letter.

Huffman trees for non-binary alphabets?

Tags:

algorithm

compression

huffman-code

templatetypedef

2 Answers

hobbs

Null Set

Recent Activity

Donate For Us

Huffman trees for non-binary alphabets?

Tags:

algorithm

compression

huffman-code

templatetypedef

2 Answers

hobbs

Null Set

Related questions

Recent Activity

Donate For Us