Hashes provide an excellent mechanism to extract values corresponding to some given key in almost O(1)
time. But it never preserves the order in which the keys are inserted. So is there any data structure which can simulate the best of array as well as hash, that is, return the value corresponding to a given key in O(1)
time, as well as returning the nth
value inserted in O(1)
time? The ordering should be maintained, i.e., if the hash is {a:1,b:2,c:3}
, and something like del hash[b]
has been done, nth(2)
should return {c,3}
.
Examples:
hash = {};
hash[a] = 1;
hash[b] = 2;
hash[c] = 3;
nth(2); //should return 2
hash[d] = 4;
del hash[c];
nth(3); //should return 4, as 'd' has been shifted up
Using modules like TIE::Hash
or similar stuff won't do, the onus is on me to develop it from scratch!
It depends on how much memory may be allocated for this data structure. For O(N) space there are several choices:
If unlimited space is available, you can do every operation in O(1) time.
You can use a combination of two data structures to find value by key and to find value by its insertion order. First one is a hash map (mapping key to both value and a position in other structure). Second one is tiered vector, which maps position to both value and key.
Tiered vector is a relatively simple data structure, it may be easily developed from scratch. Main idea is to split array into sqrt(N) smaller arrays, each of size sqrt(N). Each small array needs only O(sqrt N) time to shift values after deletion. And since each small array is implemented as circular buffer, small arrays can exchange a single element in O(1) time, which allows to complete "delete" operation in O(sqrt N) time (one such exchange for each sub-array between deleted value and first/last sub-array). Tiered vector allows insertion into the middle also in O(sqrt N), but this problem does not require it, so we can just append a new element at the end in O(1) time. To access element by its position, we need to determine starting position of circular buffer for sub-array, where element is stored, then get this element from circular buffer; this needs also O(1) time.
Since hash map remembers a position in tiered vector for each of its keys, it should be updated when any element in tiered vector changes position (O(sqrt N) hash map updates for each "delete").
To optimize "delete" operation even more, you can use approach, described in this answer. It deletes elements lazily and uses a trie to adjust element's position, taking into account deleted elements.
You can use a combination of three data structures to do this. First one is a hash map (mapping key to both value and a position in the array). Second one is an array, which maps position to both value and key. And third one is a bit set, one bit for each element of the array.
"Insert" operation just adds one more element to the array's end and inserts it into hash map.
"Delete" operation just unsets corresponding bit in the bit set (which is initialized with every bit = 1). Also it deletes corresponding entry from hash map. (It does not move elements of array or bit set). If, after "delete" the bit set has more than some constant proportion of elements deleted (like 10%), the whole data structure should be re-created from scratch (this allows O(1) amortized time).
"Find by key" is trivial, only hash map is used here.
"Find by position" requires some pre-processing. Prepare a 2D array. One index is the position we search. Other index is current state of our data structure, the bit set, reinterpreted as an index. Calculate population count for each prefix of every possible bit set and store prefix length, indexed by both population count and the bit set itself. Having this 2D array ready, you can perform this operation by first indexing by position and current "state" in this 2D array, then by indexing in the array with values.
Time complexity for every operation is O(1) (for insert/delete it is O(1) amortized). Space complexity is O(N 2N).
In practice, using whole bit set to index an array limits allowed value of N by pointer size (usually 64), even more it is limited by available memory. To alleviate this, we can split both the array and the bit set into sub-arrays of size N/C, where C is some constant. Now we can use a smaller 2D array to find nth element in each sub-array. And to find nth element in the whole structure, we need additional structure to record number of valid elements in each sub-array. This is a structure of constant size C, so every operation on it is also O(1). This additional structure may me implemented as an array, but it is better to use some logarithmic-time structure like indexable skiplist. After this modification, time complexity for every operation is still O(1); space complexity is O(N 2N/C).
Now, that the question is clear for me too (better late than never...) here are my proposals:
At first, I misunderstood the question, and gave a solution that gives O(1) key lookup, and O(n) lookup of nth element:
In Java, there is the LinkedHashMap for this particular task.
I think however that if someone finds this page, this might not be totally useless, so I leave it here...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With