Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where vs. foreach with if - why different results?

Tags:

closures

c#

.net

This code

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication
{
    internal class Program
    {
        public static void Main()
        {
            var values = new[] {1, 2, 3, 3, 2, 1, 4};
            var distinctValues = GetDistinctValuesUsingWhere(values);
            Console.WriteLine("GetDistinctValuesUsingWhere No1: " + string.Join(",", distinctValues));
            Console.WriteLine("GetDistinctValuesUsingWhere No2: " + string.Join(",", distinctValues));
            distinctValues = GetDistinctValuesUsingForEach(values);
            Console.WriteLine("GetDistinctValuesUsingForEach No1: " + string.Join(",", distinctValues));
            Console.WriteLine("GetDistinctValuesUsingForEach No2: " + string.Join(",", distinctValues));
            Console.ReadLine();
        }

        private static IEnumerable<T> GetDistinctValuesUsingWhere<T>(IEnumerable<T> items)
        {
            var set=new HashSet<T>();
            return items.Where(i=> set.Add(i));
        }

        private static IEnumerable<T> GetDistinctValuesUsingForEach<T>(IEnumerable<T> items)
        {
            var set=new HashSet<T>();
            foreach (var i in items)
            {
                if (set.Add(i))
                    yield return i;
            }
        }
    }
}

results in the following output:

GetDistinctValuesUsingWhere No1: 1,2,3,4

GetDistinctValuesUsingWhere No2:

GetDistinctValuesUsingForEach No1: 1,2,3,4

GetDistinctValuesUsingForEach No2: 1,2,3,4

I don't understand why I don't get any values in the row "GetDistinctValuesUsingWhere No2".

Can anyone explain this to me?

UPDATE after the answer from Scott, I changed the example to the following:

       private static IEnumerable<T> GetDistinctValuesUsingWhere2<T>(IEnumerable<T> items)
    {
        var set = new HashSet<T>();
        var capturedVariables = new CapturedVariables<T> {set = set};

        foreach (var i in items)
            if (capturedVariables.set.Add(i))
                yield return i;
        //return Where2(items, capturedVariables);
    }

    private static IEnumerable<T> Where2<T>(IEnumerable<T> source, CapturedVariables<T> variables)
    {
        foreach (var i in source)
            if (variables.set.Add(i))
                yield return i;
    }

    private class CapturedVariables<T>
    {
        public HashSet<T> set;
    }

This will result in two times the output 1,2,3,4.

However, if I just uncomment the line

return Where2(items, capturedVariables);

and comment the lines

foreach (var i in items) if (capturedVariables.set.Add(i)) yield return i;

in the method GetDistinctValuesUsingWhere2, I will get the output 1,2,3,4 only once. This is altough the deleted lines and the now-uncommented method are exactly the same.

I still don't get it....

like image 782
Urs Meili Avatar asked Aug 25 '17 18:08

Urs Meili


2 Answers

The reason GetDistinctValuesUsingWhere No2 does not return any results is because of variable capture.

Your where method is more like this function

    private static IEnumerable<T> GetDistinctValuesUsingWhere<T>(IEnumerable<T> items)
    {
        var set=new HashSet<T>();
        var capturedVariables = new CapturedVariables {set = set}
        return Where(items, capturedVariables);
    }

    IEnumerable<T> Where(IEnumerable<T> source, CapturedVariables variables)
    {
        foreach (var i in items)
        {
            if (variables.set.Add(i))
                yield return i;
        }

    }

So both methods are yield return under the hood, but the GetDistinctValuesUsingWhere reuses the hashset for each invocation where the GetDistinctValuesUsingForEach generates a new hashset each enumeration.

like image 155
Scott Chamberlain Avatar answered Nov 02 '22 05:11

Scott Chamberlain


Answering the updated version:

  • In the case of the GetDistinctValuesUsingWhere2() method containing the foreach loop, the returned IEnumerable captured the whole contents of the method in a closure, including the set initialization statement. This statement is thus executed each time you start iterating the enumerable, but not during the original call to GetDistinctValuesUsingWhere2().
  • In the case of the other variant, where you return Where2(), the GetDistinctValuesUsingWhere2() method does not need to capture the contents of the method because you did not define an iterator or a delegate in it. Instead, you return Where2() as the IEnumerable. The latter method only captures the foreach loop and its parameters (already initialized), but not the set initialization statement itself. Thus this time, the set initialization statement will only be executed once, during the original call to GetDistinctValuesUsingWhere2().

If necessary, put some breakpoints at various points in your code: this should help you understand what I tried to explain here.

like image 45
neural5torm Avatar answered Nov 02 '22 07:11

neural5torm