Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure with efficient manipulation and retrieval by both key and index

I'm looking for a data structure with the functionality of eg. the OrderedDictionary in .NET, that is to say an associative collection (i.e. one that associates a key with a value) that maintains element order (just like a normal List does).

It must have fast lookup by both index and key. It should also have a fast "append" operation (inserting a new item at the end), and fast removal of items with any index (based on either index or key).

The OrderedDictionary in .NET uses both a hash table and an array to store its items if I'm not mistaken. Retreiving an index based on a key (or vice versa) is therefore O(n), and of course removal of an item from the middle of an array is O(n) to start with, plus the added lookup of the index from the key if removing by key.

My question is if there exists a more efficient data structure that satisfies my conditions, or if this is indeed my best option here?

like image 650
DeCaf Avatar asked Oct 11 '11 16:10

DeCaf


People also ask

What are the common data structures every programmer must know?

In this article, I will be briefly explaining 8 commonly used data structures every programmer must know. 1. Arrays An array is a structure of fixed-size, which can hold items of the same data type. It can be an array of integers, an array of floating-point numbers, an array of strings or even an array of arrays (such as 2-dimensional arrays ).

What is indexing in data structure?

Indexing is the one of important data structure techniques that can be used to access and retrieve data in a more efficient way. An Index is created with the Search key and the Data reference. Different types of Indexing are available. The Primary Index is an ordered file and has a Primary key field as the search key and the Pointer field.

What is a data structure?

A data structure is a particular way of organizing data in a computer so that it can be used effectively. For example, we can store a list of items having the same data-type using the array data structure. This page contains detailed tutorials on different data structures with topic-wise problems.

How to optimize your data structure?

There are algorithms used with specific data structure, where some other can’t be used. The more efficient & suitable the algorithm, the more you will have an optimized data structure. Chances are, you’re going to rely on the built-in algorithms used with the data structures in your language.


2 Answers

I think you can do this with two red-black trees: A key-lookup tree to store the keys ordered by a compare function, and an index-lookup tree, with the keys in arbitrary ordering, as in a list. Each index-lookup node must have a 'size' field - a Red-Black tree can do lookup by index if a 'size' field is included in each node. See for example the RedBlackTreeSet implementation in the C5 Generic Collection Library.

Each entry in the key-lookup tree needs a pointer across to its corresponding entry in the index-lookup tree.As well as left- and right-node pointers, the index-lookup tree will require a parent pointer field to allow bottom-to-top navigation as well as top-to-bottom.

In all, six pointers are required for each key: the usual left and right pointers in both nodes, plus the pointer from the key-lookup-node to the index-lookup-node, plus the parent pointer in each of the index-lookup-nodes. You would also need a pointer in each node to point to the stored value.

Operations:

Append - An append operation would insert the key into both trees - once in the key-lookup tree, at a position determined by the compare function, and again in the rightmost position of the index-lookup tree. Insertion into a red-black tree is a logarithmic time operation.

Lookup by key - this is done on the key-lookup tree, using the compare function to find the correct position - O(log(n))

Lookup by index - this can be done on the index-lookup field, as mentioned above - O(log(n))

Get index from key - first lookup the key in the key-lookup tree O(log(n)). Follow the pointer across to the index-lookup tree. Follow the parent pointers up to the root node, (O(log(n)) for a balanced tree). Use the 'size' fields on the way up to determine the index of the key. - O(log(n)) overall.

Delete by index - lookup the item in the index-lookup tree. Delete from index-lookup tree. Lookup the located key in the key-lookup tree. Delete from the key-lookup tree. All operations are O(log(n)) , so delete is O(log(n)) overall.

Delete by key - use 'Get index from key' to get the index of the key. Delete by index from the index-lookup tree. Delete by key from the key-lookup tree. O(log(n)) overall.

This structure also supports O(log(n)) insertion at any arbitrary position, not just at the end.

Storage overhead is obviously considerable, but remains O(n). Time complexity meets all the requirements.

Unfortunately I am not aware of any implementation of this structure.

Update: It occurs to me you can combine a tree with a hash table to get O(1) by-key lookup. Instead of having two trees, as I suggest above, use a hash table for the by-key lookup, and a balanced order-statistics tree for the by-position lookup, as above, but have the slots of the hash table contain pointers to the nodes of the balanced tree for doing the get-list-position-by-key lookup. By-key lookups are now O(1), with everything else staying O(ln(n)) on average. Of course, you now get the occasional O(n) re-hash penalty, as with any hash-table.

like image 97
Crosbie Avatar answered Sep 18 '22 17:09

Crosbie


OrderedDictionary meats your requirements actually.

Your analysis of the OrderedDictionary is incorrect. It is actually O(1) for key based look up and O(1) for index according to this.

Even simple analysis gives you O(1) lookup either by key or index. Arrays provide O(1) access and hash tables provide effectively O(1) access.

Insert/ Delete is a litte more complicated, but given amortized analysis is still O(1)

The article claims that it is O(n) for insert and delete. This at least does not hold for insert since amortized analysis allows you to simply up the "cost" of inserting a given element from 1 to 2. When inserting an element that requires an array resize, the second half of the cost is used to pay the cost of copying. The final insert will take longer, but its still O(1) amortized and the discrepancy only shows up if you cause an array resize which is unlikely.

like image 26
imichaelmiers Avatar answered Sep 21 '22 17:09

imichaelmiers