Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unambiguous binary encoding scheme for the alphabet

An old British Informatics Olympiad question (3c) asks what the smallest unambiguous encoding scheme for the alphabet (using only two symbols - hence binary) is. As far as I can see, the answer is 130 - 5 bits are required to store each letter, as 2^4 < 26. The alphabet has 26 characters, so the encoding scheme is 5*26 bits long. However, the mark scheme states that 124 bits can be used. What is the encoding scheme that is that long?

like image 292
rlms Avatar asked Sep 30 '22 00:09

rlms


1 Answers

I think this works:

a - 0010
b - 0011
c - 0100
d - 0101
e - 0110
f - 0111
g - 10000
h - 10001
i - 10010
j - 10011
k - 10100
l - 10101
m - 10110
n - 10111
o - 11000
p - 11001
q - 11010
r - 11011
s - 11100
t - 11101
u - 11110
v - 11111
w - 00000
x - 00001
y - 00010
z - 00011

It is unambiguous. If a symbol starts with two or fewer zeros, it is of length 4. If it starts with a 1, it is length 5. If it starts with 000 then it is also length 5.

I got the idea by starting with a through h being length 4, using 0 as the first symbol. However, a scheme like that is short two symbols (if length is predicated entirely by the first symbol), so I looked for a way to reduce the number of four symbol codes by two... and noticed that 0000 and 0001 were the only two that had a triple0. Two bits give you four characters and the rest is an unambiguous encoding scheme :)

6 * 4 + 20 * 5 = 124

or alternatively

4 + 16 + 6 = 26

like image 190
Jesan Fafon Avatar answered Oct 03 '22 08:10

Jesan Fafon