Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is List<T>.Enumerator faster than my implementation?

Tags:

c#

.net

I've found myself in a position where I have to roll my own dynamic array implementation, due to various large performance benefits (in my case). However, after creating an enumerator for my version, and comparing the efficiency with the one List uses, I'm a bit bewildered; the List one is aproximately 30-40% faster than my version, even though it's much more complex.

Here's the important part of the List enumerator implementation:

public struct Enumerator : IEnumerator<T>, IDisposable, IEnumerator
{
    private List<T> list;
    private int index;
    private int version;
    private T current;
    internal Enumerator(List<T> list)
    {
        this.list = list;
        this.index = 0;
        this.version = list._version;
        this.current = default(T);
        return;
    }

    public bool MoveNext()
    {
        List<T> list;
        list = this.list;
        if (this.version != list._version)
        {
            goto Label_004A;
        }
        if (this.index >= list._size)
        {
            goto Label_004A;
        }
        this.current = list._items[this.index];
        this.index += 1;
        return 1;
        Label_004A:
        return this.MoveNextRare();
    }

    public T Current
    {
        get {  return this.current; }
    }
}

And here's my very barebone version:

internal struct DynamicArrayEnumerator<T> : IEnumerator<T> where T : class
{
     private readonly T[] internalArray;
     private readonly int lastIndex;
     private int currentIndex;

     internal DynamicArrayEnumerator(DynamicArray<T> dynamicArray)
     {
          internalArray = dynamicArray.internalArray;
          lastIndex = internalArray.Length - 1;
          currentIndex = -1;
     }

     public T Current
     {
          get { return internalArray[currentIndex]; }
     }

     public bool MoveNext()
     {
          return (++currentIndex <= lastIndex);
     }
}

I know this is micro-optimization, but I'm actually interested in understanding why the List enumerator is so much faster than mine. Any ideas? Thanks!

Edit: As requested; the DynamicArray class (the relevant parts): The enumerator is an inner class in this.

public struct DynamicArray<T> : IEnumerable<T> where T : class
{
    private T[] internalArray;
    private int itemCount;

    internal T[] Data
    {
        get { return internalArray; }
    }

    public int Count
    {
        get { return itemCount; }
    }

    public DynamicArray(int count)
    {
        this.internalArray = new T[count];
        this.itemCount = 0;
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new DynamicArrayEnumerator<T>(this);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

}

As for how I'm testing:

 List<BaseClass> list = new List<BaseClass>(1000000);
 DynamicArray<BaseClass> dynamicArray = new DynamicArray<BaseClass>(1000000);

// Code for filling with data omitted.

   int numberOfRuns = 0;
   float p1Total = 0;
   float p2Total = 0;
   while (numberOfRuns < 100)
   {
        PerformanceAnalyzer p1 = new PerformanceAnalyzer(() =>
        {
             int u = 0;
             foreach (BaseClass b in list)
             {
                  if (b.B > 100)   // Some trivial task
                      u++;
             }
        });
        p1.ExecuteAndClock();
        p1Total += p1.TotalElapsedTicks;

        PerformanceAnalyzer p2 = new PerformanceAnalyzer(() =>
        {
             int u = 0;
             foreach (BaseClass b in dynamicArray)
             {
                  if (b.B > 100)  // Some trivial task
                       u++;
             }
        });
        p2.ExecuteAndClock();
        p2Total += p2.TotalElapsedTicks;

        numberOfRuns++;
    }

    Console.WriteLine("List enumeration: " + p1Total / totalRuns + "\n");
    Console.WriteLine("Dynamic array enumeration: " + p2Total / totalRuns + "\n");

The PerformanceAnalyzer class basically starts a Stopwatch, execute the supplied Action delegate, and then stop the Stopwatch afterwards.

Edit 2 (Quick answer to Ryan Gates): There's a few reasons why I would want to roll my own, most importantly I need a very fast RemoveAt(int index) method.

Since I don't have to worry about the order of the list elements in my particular case, I can avoid the .Net built-in list's way of doing it:

public void RemoveAt(int index)
{
    T local;
    if (index < this._size)
    {
        goto Label_000E;
    }
    ThrowHelper.ThrowArgumentOutOfRangeException();
Label_000E:
    this._size -= 1;
    if (index >= this._size)
    {
        goto Label_0042;
    }
    Array.Copy(this._items, index + 1, this._items, index, this._size - index);
Label_0042:
    this._items[this._size] = default(T);
    this._version += 1;
    return;
}

And instead using something along the lines of:

public void RemoveAt(int index)
{
     // overwrites the element at the specified index with the last element in the array and decreases the item count.
     internalArray[index] = internalArray[itemCount];  
     itemCount--;
}

Potencially saving enormous amounts of time in my case, if say the first 1000 elements in a long list have to be removed by index.

like image 233
H.S. Avatar asked Dec 06 '12 20:12

H.S.


1 Answers

Okay, aside from benchmarking problems, here's how you can make your DynamicArray class more like List<T>:

public DynamicArrayEnumerator<T> GetEnumerator()
{
    return new DynamicArrayEnumerator<T>(this);
}

IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
    return GetEnumerator();
}

IEnumerator IEnumerable.GetEnumerator()
{
    return this.GetEnumerator();
}

Now, code which knows it's working with a dynamic array can iterate with a DynamicArrayEnumerator<T> without any boxing, and without virtual dispatch. This is exactly what List<T> does. The compiler notices when a type implements the pattern in a custom manner, and will use the types involved instead of the interfaces.

With your current code, you're getting no benefit from creating a struct - because you're boxing it in GetEnumerator().

Try the above change and fix the benchmark to work for longer. I'd expect to see a big difference.

like image 163
Jon Skeet Avatar answered Sep 28 '22 05:09

Jon Skeet