Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing the Oxford English Dictionary

I was asked in an interview how I would design the Oxford English Dictionary.

I told him that I'd use a TREE data structure, but he replied that it would take a lot of memory. So which other data structure should be used?

like image 439
AKIWEB Avatar asked Dec 10 '11 00:12

AKIWEB


1 Answers

One data structure I heard was used in the past in mobile phones for storing T9 dictionaries is the following (well, this addresses only the key issue, but not the definition storage):

Entries are sorted, and each entry should start with an offset into the previous entry from where it should be continued, and also the continuation. For example:

apple
4icable
7tion

would decode to apple, applicable, application. However this might not be that different from tries with merged chains, see

appl -> e 
     -> ica -> ble
            -> tion

Wikipedia uncovered the Directed acyclic word graph, which differs from trees that it not only branches, but branches can merge, where words have the same suffix. This indeed could be a superior storage.

        a
       / \
  pplic   utom
       \ /
      ation
like image 122
ron Avatar answered Sep 21 '22 00:09

ron