I was asked in an interview how I would design the Oxford English Dictionary.
I told him that I'd use a TREE data structure, but he replied that it would take a lot of memory. So which other data structure should be used?
One data structure I heard was used in the past in mobile phones for storing T9 dictionaries is the following (well, this addresses only the key issue, but not the definition storage):
Entries are sorted, and each entry should start with an offset into the previous entry from where it should be continued, and also the continuation. For example:
apple
4icable
7tion
would decode to apple, applicable, application. However this might not be that different from tries with merged chains, see
appl -> e
-> ica -> ble
-> tion
Wikipedia uncovered the Directed acyclic word graph, which differs from trees that it not only branches, but branches can merge, where words have the same suffix. This indeed could be a superior storage.
a
/ \
pplic utom
\ /
ation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With