Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extension methods on IEnumerable<T>: how is it performance?

From my mentor: Prefer native methods (implemented directly on the collection) over extension methods of IEnumerable, because:

The LINQ-to-Objects extension methods are implemented on IEnumerable, meaning that in the worst-case scenario (when the item you search for does not exist in the collection) you will have to enumerate thru all elements. If you have a Contains or Exists method implemented directly on the collection, it could make use of internal knowledge and maybe just do a hash table look up or some other quick operation.

I was a deeply confused, because I think Microsoft should have implemented hash table for IEnumerable Contains/Exists already. A quick benchmark with List and IEnumerable show no differences:

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}

}

Edit: I tried HashSet implementation, which greatly increases performance:

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

Still, what is your opinion about my mentor saying?

Can anyone clear out for me? Is my mentor right? If he's right, what is wrong with my code?

Thank you very much

like image 630
Quan Mai Avatar asked Oct 11 '22 02:10

Quan Mai


1 Answers

List<T> Contains calls are just iterating the list, so they won't be faster than the extension method. If you were to use a HashSet<T> and try a series of Contains() operations, you would find a marked improvement.

Edit: the reason Microsoft didn't utilize a hash for the IEnumerable<T> extension methods is they could not guarantee that the implementing class used a hash or something similar. They had to go with the naive approach because the IEnumerable<T> interface only guarantees that the implementing class be enumerated.

like image 112
dlev Avatar answered Oct 15 '22 10:10

dlev