Unambiguous binary encoding scheme for the alphabet

Question

An old British Informatics Olympiad question (3c) asks what the smallest unambiguous encoding scheme for the alphabet (using only two symbols - hence binary) is. As far as I can see, the answer is 130 - 5 bits are required to store each letter, as 2^4 < 26. The alphabet has 26 characters, so the encoding scheme is 5*26 bits long. However, the mark scheme states that 124 bits can be used. What is the encoding scheme that is that long?

Jesan Fafon · Accepted Answer

I think this works:

a - 0010
b - 0011
c - 0100
d - 0101
e - 0110
f - 0111
g - 10000
h - 10001
i - 10010
j - 10011
k - 10100
l - 10101
m - 10110
n - 10111
o - 11000
p - 11001
q - 11010
r - 11011
s - 11100
t - 11101
u - 11110
v - 11111
w - 00000
x - 00001
y - 00010
z - 00011

It is unambiguous. If a symbol starts with two or fewer zeros, it is of length 4. If it starts with a 1, it is length 5. If it starts with 000 then it is also length 5.

I got the idea by starting with a through h being length 4, using 0 as the first symbol. However, a scheme like that is short two symbols (if length is predicated entirely by the first symbol), so I looked for a way to reduce the number of four symbol codes by two... and noticed that 0000 and 0001 were the only two that had a triple0. Two bits give you four characters and the rest is an unambiguous encoding scheme :)

6 * 4 + 20 * 5 = 124

or alternatively

4 + 16 + 6 = 26

Unambiguous binary encoding scheme for the alphabet

Tags:

character-encoding

encoding

binary

information-theory

rlms

1 Answers

Jesan Fafon

Recent Activity

Donate For Us

Unambiguous binary encoding scheme for the alphabet

Tags:

character-encoding

encoding

binary

information-theory

rlms

1 Answers

Jesan Fafon

Related questions

Recent Activity

Donate For Us