An old British Informatics Olympiad question (3c) asks what the smallest unambiguous encoding scheme for the alphabet (using only two symbols - hence binary) is. As far as I can see, the answer is 130 - 5 bits are required to store each letter, as 2^4 < 26. The alphabet has 26 characters, so the encoding scheme is 5*26 bits long. However, the mark scheme states that 124 bits can be used. What is the encoding scheme that is that long?
I think this works:
a - 0010
b - 0011
c - 0100
d - 0101
e - 0110
f - 0111
g - 10000
h - 10001
i - 10010
j - 10011
k - 10100
l - 10101
m - 10110
n - 10111
o - 11000
p - 11001
q - 11010
r - 11011
s - 11100
t - 11101
u - 11110
v - 11111
w - 00000
x - 00001
y - 00010
z - 00011
It is unambiguous. If a symbol starts with two or fewer zeros, it is of length 4. If it starts with a 1, it is length 5. If it starts with 000
then it is also length 5.
I got the idea by starting with a through h being length 4, using 0 as the first symbol. However, a scheme like that is short two symbols (if length is predicated entirely by the first symbol), so I looked for a way to reduce the number of four symbol codes by two... and noticed that 0000
and 0001
were the only two that had a triple0
. Two bits give you four characters and the rest is an unambiguous encoding scheme :)
6 * 4 + 20 * 5 = 124
or alternatively
4 + 16 + 6 = 26
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With