My goal is to create a data structure implementing IList<T>
interface which will achieve O(1)
element lookup time by compromising memory.
Background
As you know all array based IList<T>
implementations like List<T>
have O(n)
element lookup time. It means that the operations like int IndexOf(T element)
or bool Contains(T element)
iterate through the underlaying array until they find a match.
Well known idea achive that is to use a combination of a list and hashtable as underlaying data structures. Values are kept in a list. The hashtable will keep indexes as values and values of list as keys. So lookup can be performed using hashtable.
That's exactly how the KeyedCollection<TKey, TItem>
see MSDN is implemented.
What I have tried so far
internal class MyList<T> : KeyedCollection<T, T>
{
protected override T GetKeyForItem(T item)
{
return item;
}
}
This worked so far except one problem. This data structure does not mimic the behavior expected behind List<T>
exactly. The point is that the List<T>
allows duplicates, MyList
does not.
Question
Is there any ready to use data structure or can you recommend an elegant way of implementing the IList<T>
so that:
O(1)
time.O()
performance as List<T>
constantA + constantB * n
bytes).The only way I could see this is using a dictionary of lists. Hitting the key gives you a list of all duplicates that create that particular key. Just always take the first one.
Building on what Ryan Bennett
has proposed, I think the best you are going to come up with (as you state order is important) is to create a class that implements IList and then internally has something like this:
class MyList<T> : IList<T>
{
Dictionary<T, List<int>> _indexMap;
List<T> _items;
public int IndexOf(T item)
{
List<int> indices;
if(_indexMap.TryGetValue(item, out indices))
{
return indices[0];
}
return -1;
}
public void Add(T item)
{
List<int> indices;
if(!_indexMap.TryGetValue(item, out indices))
{
indices = new List<int>();
_indexMap[item] = indices;
}
indices.Add(_items.Count);
_items.Add(item);
}
// Attempt at a Remove implementation, this could probably be improved
// but here is my first crack at it
public bool Remove(T item)
{
List<int> indices;
if(!_indexMap.TryGetValue(item, out indices))
{
// Not found so can just return false
return false;
}
int index = indices[0];
indices.RemoveAt(0);
if (indices.Count == 0)
{
_indexMap.Remove(item);
}
for(int i=index+1; i < _items.Count; ++i)
{
List<int> otherIndexList = _indexMap[_items[i]];
for(int j=0; j < otherIndexList.Count; ++j)
{
int temp = otherIndexList[j];
if (temp > index)
{
otherIndexList[j] = --temp;
}
}
}
return _items.RemoveAt(index);
}
// ... Other similar type functions here
}
Edit:
Just realized that things get real sticky here when you do a Remove
. You will have to walk the collection of indices and update any index with a value > the index of the item you remove. You have now increased the "remove" time. You have also made it tricky to get correct. I would throw a massive amount of unit tests around this collection if you were going to try to implement something like this.
I know you are stating order is important so I am assuming that is why you are not going wtih a Sorted list approach which would allow the duplicates and give you O(log n) operation times.
Edit 2: Another book keeping type approach
I am just bouncing this one around in my head and so I will only give some rough pseudo code but you could possibly take an approach where you just have a dictionary of items mapped to a list of indices and a second dictionary that maps indices to items. If you add the restriction that T is a class then you are only paying the overhead of two storages of the reference. You then need to maintain a current "last" so that you can easily add a new item into the collection. This should make the remove operation a bit cleaner. It is still O(n) because you have to update anything with an index > the removed item. In first imaginings, this seems like a potential solution that will get you close to what you want to achieve (if I am understanding the goals correctly).
The hash table should hold a list of indices for each key. And I think this is all you need, no?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With