Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this closure combination behaviour a C# compiler bug?

I was investigating some strange object lifetime issues, and came across this very puzzling behaviour of the C# compiler:

Consider the following test class:

class Test
{
    delegate Stream CreateStream();

    CreateStream TestMethod( IEnumerable<string> data )
    {
        string file = "dummy.txt";
        var hashSet = new HashSet<string>();

        var count = data.Count( s => hashSet.Add( s ) );

        CreateStream createStream = () => File.OpenRead( file );

        return createStream;
    }
}

The compiler generates the following:

internal class Test
{
  public Test()
  {
    base..ctor();
  }

  private Test.CreateStream TestMethod(IEnumerable<string> data)
  {
    Test.<>c__DisplayClass1_0 cDisplayClass10 = new Test.<>c__DisplayClass1_0();
    cDisplayClass10.file = "dummy.txt";
    cDisplayClass10.hashSet = new HashSet<string>();
    Enumerable.Count<string>(data, new Func<string, bool>((object) cDisplayClass10, __methodptr(<TestMethod>b__0)));
    return new Test.CreateStream((object) cDisplayClass10, __methodptr(<TestMethod>b__1));
  }

  private delegate Stream CreateStream();

  [CompilerGenerated]
  private sealed class <>c__DisplayClass1_0
  {
    public HashSet<string> hashSet;
    public string file;

    public <>c__DisplayClass1_0()
    {
      base..ctor();
    }

    internal bool <TestMethod>b__0(string s)
    {
      return this.hashSet.Add(s);
    }

    internal Stream <TestMethod>b__1()
    {
      return (Stream) File.OpenRead(this.file);
    }
  }
}

The original class contains two lambdas: s => hashSet.Add( s ) and () => File.OpenRead( file ). The first closes over local variable hashSet, the second closes over local variable file. However, the compiler generates a single closure implementation class <>c__DisplayClass1_0 that contains both hashSet and file. As a consequence, the returned CreateStream delegate contains and keeps alive a reference to the hashSet object that should have been available for GC once TestMethod returned.

In the actual scenario where I've encountered this issue, a very substantial (ie, >100mb) object is incorrectly enclosed.

My specific questions are:

  1. Is this a bug? If not, why is this behaviour considered desirable?

Update:

The C# 5 spec 7.15.5.1 says:

When an outer variable is referenced by an anonymous function, the outer variable is said to have been captured by the anonymous function. Ordinarily, the lifetime of a local variable is limited to execution of the block or statement with which it is associated (§5.1.7). However, the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection.

This would appear to be open to some degree of interpretation, and does not explicitly prohibit a lambda from capturing variables that it does not reference. However, this question covers a related scenario, which @eric-lippert considered to be a bug. IMHO, I see the combined closure implementation provided by the compiler as a good optimization, but that the optimization should not be used for lambdas which the compiler can reasonably detect may have lifetime beyond the current stack frame.


  1. How do I code against this without abandoning the use of lambdas all together? Notably how do I code against this defensively, so that future code changes don't suddenly cause some other unchanged lambda in the same method to start enclosing something that it shouldn't?

Update:

The code example I've provided is by necessity contrived. Clearly, refactoring lambda creation out to a separate method works around the problem. My question is not intended to be about design best practices (as well covered by @peter-duniho). Rather, given the content of the TestMethod as it stands, I'd like to know if there is any way to coerce the compiler to exclude the createStream lambda from the combined closure implementation.


For the record, I'm targeting .NET 4.6 with VS 2015.

like image 560
tg73 Avatar asked Nov 24 '15 13:11

tg73


2 Answers

Is this a bug?

No. The compiler is compliant with the specification here.

Why is this behaviour considered desirable?

It's not desirable. It's deeply unfortunate, as you discovered here, and as I described back in 2007:

http://blogs.msdn.com/b/ericlippert/archive/2007/06/06/fyi-c-and-vb-closures-are-per-scope.aspx

The C# compiler team has considered fixing this in every version since C# 3.0 and it has never been high enough priority. Consider entering an issue on the Roslyn github site (if there isn't one already; there may well be).

I personally would like to see this fixed; as it stands it is a big "gotcha".

How do I code against this without abandoning the use of lambdas all together?

The variable is the thing that is captured. You could set the hashset variable to null when you're done with it. Then the only memory being consumed is the memory for the variable, four bytes, and not the memory for the thing it is referring to, which will be collected.

like image 118
Eric Lippert Avatar answered Nov 18 '22 23:11

Eric Lippert


I'm not aware of anything in the C# language specification that would dictate exactly how a compiler is to implement anonymous methods and variable capturing. This is an implementation detail.

What the specification does do is set some rules for how the anonymous methods and their capture variables are required to behave. I don't have a copy of the C# 6 specification, but here is relevant text from the C# 5 specification, under "7.15.5.1 Captured outer variables":

…the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection. [emphasis mine]

There is nothing in the specification that limits the lifetime of the variable. The compiler is simply required to make sure the variable lives long enough to remain valid if needed by the anonymous method.

So…

1.Is this a bug? If not, why is this behaviour considered desirable?

Not a bug. The compiler is complying with the specification.

As for whether it's considered "desirable", that's a loaded term. What's "desirable" depends on your priorities. That said, one priority of a compiler author is to simplify the task of the compiler (and in doing so, making it run faster and reduce the chances of bugs). This particular implementation might be considered "desirable" in that context.

On the other hand, language designers and compiler authors both also have a shared goal of helping programmers produce working code. Inasmuch as an implementation detail can interfere with this, such an implementation detail might be considered "undesirable". Ultimately, it's a matter of how each of those priorities are ranked, according to their potentially competing goals.

2.How do I code against this without abandoning the use of lambdas all together? Notably how do I code against this defensively, so that future code changes don't suddenly cause some other unchanged lambda in the same method to start enclosing something that it shouldn't?

Hard to say without a less-contrived example. In general, I'd say the obvious answer is "don't mix your lambdas like that". In your particular (admittedly contrived) example, you have one method that appears to be doing two completely different things. This is generally frowned upon for a variety of reasons, and it seems to me that this example just adds to that list.

I don't know what the best way to fix the "two different things" would be, but an obvious alternative would be to at least refactor the method so that the "two different things" method is delegating the work to another two methods, each named descriptively (which has the bonus benefit of helping the code to be self-documenting).

For example:

CreateStream TestMethod( IEnumerable<string> data )
{
    string file = "dummy.txt";
    var hashSet = new HashSet<string>();

    var count = AddAndCountNewItems(data, hashSet);

    CreateStream createStream = GetCreateStreamCallback(file);

    return createStream;
}

int AddAndCountNewItems(IEnumerable<string> data, HashSet<string> hashSet)
{
    return data.Count( s => hashSet.Add( s ) );
}

CreateStream GetCreateStreamCallback(string file)
{
    return () => File.OpenRead( file );
}

In this way, the captured variables remain independent. Even if the compiler does for some bizarre reason still put them both into the same closure type, it still should not result in the same instance of that type used between the two closures.

Your TestMethod() still does two different things, but at least it doesn't itself contain those two unrelated implementations. The code is more readable and better compartmentalized, which is a good thing even besides the fact that it fixes the variable lifetime issue.

like image 35
Peter Duniho Avatar answered Nov 18 '22 22:11

Peter Duniho