Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure with O(1) lookup time which will allow duplicates [closed]

My goal is to create a data structure implementing IList<T> interface which will achieve O(1) element lookup time by compromising memory.

Background As you know all array based IList<T> implementations like List<T> have O(n) element lookup time. It means that the operations like int IndexOf(T element) or bool Contains(T element) iterate through the underlaying array until they find a match.

Well known idea achive that is to use a combination of a list and hashtable as underlaying data structures. Values are kept in a list. The hashtable will keep indexes as values and values of list as keys. So lookup can be performed using hashtable.

That's exactly how the KeyedCollection<TKey, TItem> see MSDN is implemented.

What I have tried so far

internal class MyList<T> : KeyedCollection<T, T>
{
    protected override T GetKeyForItem(T item)
    {
        return item;
    }
}

This worked so far except one problem. This data structure does not mimic the behavior expected behind List<T> exactly. The point is that the List<T> allows duplicates, MyList does not.

Question

Is there any ready to use data structure or can you recommend an elegant way of implementing the IList<T> so that:

  1. Lookup operations have O(1) time.
  2. All other operations have same O() performance as List<T>
  3. Memory can be compromised by hashtable overhead (constantA + constantB * n bytes).
  4. Duplicates must be allowed
  5. Allowing nulls is optional (they can be boxed into null objects)
like image 897
George Mamaladze Avatar asked Nov 29 '12 18:11

George Mamaladze


3 Answers

The only way I could see this is using a dictionary of lists. Hitting the key gives you a list of all duplicates that create that particular key. Just always take the first one.

like image 151
Ryan Bennett Avatar answered Dec 22 '22 09:12

Ryan Bennett


Building on what Ryan Bennett has proposed, I think the best you are going to come up with (as you state order is important) is to create a class that implements IList and then internally has something like this:

class MyList<T> : IList<T>
{
    Dictionary<T, List<int>> _indexMap;
    List<T> _items;


    public int IndexOf(T item)
    {
        List<int> indices;
        if(_indexMap.TryGetValue(item, out indices))
        {
            return indices[0];
        }
        return -1;
    }

    public void Add(T item)
    {
        List<int> indices;
        if(!_indexMap.TryGetValue(item, out indices))
        {
            indices = new List<int>();
            _indexMap[item] = indices;
        }

        indices.Add(_items.Count);
        _items.Add(item);
    }

    // Attempt at a Remove implementation, this could probably be improved
    // but here is my first crack at it
    public bool Remove(T item)
    {
        List<int> indices;
        if(!_indexMap.TryGetValue(item, out indices))
        {
            // Not found so can just return false
            return false;
        }

        int index = indices[0];
        indices.RemoveAt(0);
        if (indices.Count == 0)
        {
            _indexMap.Remove(item);
        }

        for(int i=index+1; i < _items.Count; ++i)
        {
            List<int> otherIndexList = _indexMap[_items[i]];
            for(int j=0; j < otherIndexList.Count; ++j)
            {
                int temp = otherIndexList[j];
                if (temp > index)
                {
                    otherIndexList[j] = --temp;
                }
            }
        }

        return _items.RemoveAt(index);
    }

    // ... Other similar type functions here
}

Edit:

Just realized that things get real sticky here when you do a Remove. You will have to walk the collection of indices and update any index with a value > the index of the item you remove. You have now increased the "remove" time. You have also made it tricky to get correct. I would throw a massive amount of unit tests around this collection if you were going to try to implement something like this.

I know you are stating order is important so I am assuming that is why you are not going wtih a Sorted list approach which would allow the duplicates and give you O(log n) operation times.

Edit 2: Another book keeping type approach
I am just bouncing this one around in my head and so I will only give some rough pseudo code but you could possibly take an approach where you just have a dictionary of items mapped to a list of indices and a second dictionary that maps indices to items. If you add the restriction that T is a class then you are only paying the overhead of two storages of the reference. You then need to maintain a current "last" so that you can easily add a new item into the collection. This should make the remove operation a bit cleaner. It is still O(n) because you have to update anything with an index > the removed item. In first imaginings, this seems like a potential solution that will get you close to what you want to achieve (if I am understanding the goals correctly).

like image 42
pstrjds Avatar answered Dec 22 '22 07:12

pstrjds


The hash table should hold a list of indices for each key. And I think this is all you need, no?

like image 26
zenpoy Avatar answered Dec 22 '22 09:12

zenpoy