Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allowing iteration without generating any garbage

I have the following code in an object pool that implements the IEnumerable interface.

public IEnumerable<T> ActiveNodes
{
    get
    {
        for (int i = 0; i < _pool.Count; i++)
        {
            if (_pool[i].AvailableInPool)
            {
                yield return _pool[i];
            }
        }
    }
}

As far as I know (according to this question), this will generate garbage as the IEnumerable object will need to be collected. None of the elements in _pool will ever be collected, as the purpose of the pool is to keep references to all of them to prevent garbage creation.

Can anyone suggest a way to allow iteration over _pool so that no garbage is generated?

When iterating over pool, all of the items in pool that have AvailableInPool == true should be iterated over. Order doesn't matter.

like image 892
Olhovsky Avatar asked Mar 30 '11 13:03

Olhovsky


4 Answers

First off, a number of people are pushing back on Olhovsky to suggest that this is worrying about nothing. Avoiding collection pressure is actually very important in some applications on some environments.

The compact framework garbage collector has an unsophisticated policy; it triggers a collection every time 1000KB of memory has been allocated. Now suppose you are writing a game that runs on the compact framework, and the physics engine generates 1KB of garbage every time it runs. Physics engines are typically run on the order of 20 times a second. So that's 1200KB of pressure per minute, and hey, that's already more than one collection per minute just from the physics engine. If the collection causes a noticable stutter in the game then that might be unacceptable. In such a scenario, anything you can do to decrease collection pressure helps.

I am learning this myself the hard way, even though I work on the desktop CLR. We have scenarios in the compiler where we must avoid collection pressure, and we are jumping through all kinds of object pooling hoops to do so. Olhovsky, I feel your pain.

So, to come to your question, how can you iterate over the collection of pooled objects without creating collection pressure?

First, let's think about why collection pressure happens in the typical scenario. Suppose you have

 foreach(var node in ActiveNodes) { ... }

Logically this allocates two objects. First, it allocates the enumerable -- the sequence -- that represents the sequence of nodes. Second, it allocates the enumerator -- the cursor -- that represents the current position in the sequence.

In practice sometimes you can cheat a bit and have one object that represents both the sequence and the enumerator, but you still have one object allocated.

How can we avoid this collection pressure? Three things come to mind.

1) Don't make an ActiveNodes method in the first place. Make the caller iterate over the pool by index, and check themselves whether the node is available. The sequence is then the pool, which is already allocated, and the cursor is an integer, neither of which are creating new collection pressure. The price you pay is duplicated code.

2) As Steven suggests, the compiler will take any types that have the right public methods and properties; they don't have to be IEnumerable and IEnumerator. You can make your own mutable-struct sequence and cursor objects, pass those around by value, and avoid collection pressure. It is dangerous to have mutable structs, but it is possible. Note that List<T> uses this strategy for its enumerator; study its implementation for ideas.

3) Allocate the sequence and the enumerators on the heap normally and pool them too! You're already going with a pooling strategy, so there's no reason why you can't pool an enumerator as well. Enumerators even have a convenient "Reset" method that usually just throws an exception, but you could write a custom enumerator object that used it to reset the enumerator back to the beginning of the sequence when it goes back in the pool.

Most objects are only enumerated once at a time, so the pool can be small in typical cases.

(Now, of course you may have a chicken-and-egg problem here; how are you going to enumerate the pool of enumerators?)

like image 138
Eric Lippert Avatar answered Nov 19 '22 20:11

Eric Lippert


Iterating items will in any 'normal' design usually result in the creation of a new enumerable object. Creating and disposing objects is very fast, so only in very special scenarios (where low latency is the top most priority) garbage collections could (I say 'could') be a problem.

A design without garbage is possible by returning structures that don't implement IEnumerable. The C# compiler can still iterate such objects, because the foreach statement uses duck typing. .NET's List<T>, for instance, takes this approach.

When using foreach over both an array and List<T>, no garbage will be generated. When using foreach on an array, C# will transform the operation to a for statement, while List<T> already implements a struct enumerator, causing the foreach to produce no garbage.

Here is a struct enumerable and struct enumerator. When you return the enumerable, the C# compiler can foreach over it:

public struct StructEnumerable<T>
{
    private readonly List<T> pool;

    public StructEnumerable(List<T> pool)
    {
        this.pool = pool;
    }

    public StructEnumerator<T> GetEnumerator()
    {
        return new StructEnumerator<T>(this.pool);
    }
}

Here is the StructEnumerator:

public struct StructEnumerator<T>
{
    private readonly List<T> pool;
    private int index;

    public StructEnumerator(List<T> pool)
    {
        this.pool = pool;
        this.index = 0;
    }

    public T Current
    {
        get
        {
            if (this.pool == null || this.index == 0)
                throw new InvalidOperationException();

            return this.pool[this.index - 1];
        }
    }

    public bool MoveNext()
    {
        this.index++;
        return this.pool != null && this.pool.Count >= this.index;
    }

    public void Reset()
    {
        this.index = 0;
    }
}

You can simply return the StructEnumerable<T> as follows:

public StructEnumerable<T> Items
{
    get { return new StructEnumerable<T>(this.pool); }
}

And C# can iterate over this with a normal foreach:

foreach (var item in pool.Items)
{
    Console.WriteLine(item);
}

Note that you can't LINQ over the item using System.Linq.Enumerable You need the IEnumerable<T> interface for that, and that involves creating enumerators and, therefore, garbage collection. You could, of course, build your own LINQ extension methods, but that will unlikely help, because that will often still result in new objects being created (when closures are being generated for used delegates).

like image 42
Steven Avatar answered Nov 19 '22 21:11

Steven


Since XNA for XBox also works over the Compact Framework (and I suspect that's what you're working on given the hints you've given(1)), we can trust the XNA devs to teach us exactly when foreach creates garbage.

To quote the most relevant point (although the entire article's worth reading):

When doing a foreach over a Collection<T> an enumerator WILL be allocated.

When doing a foreach over most other collections, including as arrays, lists, queues, linked lists, and others:

  • if the collections are used explicitly, an enumerator will NOT be allocated.
  • if they are cast to interfaces, an enumerator WILL be allocated.

So, if _pool is a List, array or similar and can afford to, you can either return that type directly or cast the IEnumerable<T> to the respective type to avoid garbage during the foreach.

As some additional reading, Shawn Hargreaves can have some useful additional information.

(1) 60 calls per second, Compact Framework, can't go down to native code, 1MB of allocation before a GC is triggered.

like image 6
Kyte Avatar answered Nov 19 '22 22:11

Kyte


Does it have to be IEnumerable? Will refactoring to an array with good old indexed acecss help?

like image 1
Seva Alekseyev Avatar answered Nov 19 '22 21:11

Seva Alekseyev