Which data structure is best in terms of computational complexity to implement a dictionary of (key,val) items, which must support only following commands:
Insert(key)
- appends an item (key,val) with val=1Increment(key)
- increments val of existed (key,val)Find(key)
- returns a val of (key,val)Select(part_of_key)
- returns a list of all items (key,val) if strstr(key,part_of_key)!=NULL
in the form of a new dictionary of the same type (without allocating new memory if possible); for example if dictionary is {(red,3), (blue,4), (green,1)}, then Select(re)={(red,3), (green,1)}Max(i)
- returns an item which has the i-th maximal value among all items; for example if dictionary is {(red,3), (blue,4), (green,1)}, then Max(1)=blue, Max(2)=red, Max(3)=greenThe keys are strings and the values are positive integers. The number of items in the dictionary is expected to be very large.
I think it must be a synthesis of two different data structures. But should it be a hash table + a binary tree or a trie + sorted array or something else?
A combination of balanced tree (such as red-black tree) and suffix tree (or suffix array).
NOTE: Hash table will not be able to support operation 5 efficiently.
I think you'll have a hard time implementing the suffix tree. You could possibly use Mark Nelson's C++ implementation of Ukkonen's algorithm, but it has memory leaks and is essentially a singleton, so you'll need to clean it up before being ready for production use. Even after you fix it, you'll need to adjust it so it works with your "other" data structure (which is balanced tree in my proposal) instead of one big plain string.
If you do operation 1 more frequently than operation 4 and/or you can live with linear operation 4, I recommend you skip the whole complication with the suffix tree and just traverse your data structure linearly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With