Our server app has several methods, called in sequence that iterate through a 20M-row resultset and transform it. Each method in this pipeline stores a 200+ megabyte copy of the data, with predictably bad RAM and GC performance impact.
Each method follows a similar pattern:
public HugeCollection1 Step1 (SomeType sourceData)
{
var transformed = new List<RowType>;
using (var foo = InitializeSomethingExpensive(sourceData))
{
foreach (var row in foo)
{
transformed.Add (TransformRow(row));
}
}
return transformed;
}
Then these methods are called in a pipeline, e.g.
var results1 = Step1(sourceData);
var results2 = Step2(results1);
var results3 = Step3(results2);
...
var finalResults = StepN (resultsNMinus1);
return finalResults; // final results
I'd like to transform this into a more functional solution that iterates through the original source data without ever holding the entire dataset in RAM. I want to end up with a List of the final results without any intermediate collections.
If there were no setup required at each stage of the pipeline, then the solution would be simple: just run each transformation for each row and store only the final result.
var transformed = new List<SmallResult>;
// TODO: How to set up and ensure teardown of the *other* pipeline steps?
using (var foo = InitializeSomethingExpensive(sourceData))
{
foreach (var row in foo)
{
object result = row;
foreach (var step in Pipeline)
{
result = step.Transform (result);
}
transformed.Add (result as SmallResult);
}
}
return transformed;
But today, each of those separate pipeline steps has its own expensive setup and tear-down process that's enforced via a using
block.
What's a good pattern to refactor each of these pipeline methods so the setup/teardown code is guaranteed to happen? In pseudo-code, I'd like to end up with this:
It's not practical to combine all the using blocks into a single method because the code in each of these steps is long and shared and I don't want to repeat that shared code in one method.
I know I could manually replace the using
block with try
/finally
, but doing that manually for multiple resources seems harder than necessary.
Is there a simpler solution possible, e.g. using using
and yield
together in a smart way? Or is there a good "multi-using" class implementation available that makes this coordinated setup/teardown process easy (e.g. its constructor accepts a list of functions that return IDisposable and its Dispose() implementation would ensure that everything is cleaned up)?
Seems like this is a pattern that someone smarter than I has already figured out, so asking here before re-inventing the wheel.
break command (C and C++) The break command allows you to terminate and exit a loop (that is, do , for , and while ) or switch command from any point other than the logical end.
The break statement exits a for or while loop completely. To skip the rest of the instructions in the loop and begin the next iteration, use a continue statement. break is not defined outside a for or while loop. To exit a function, use return .
The looping can be defined as repeating the same process multiple times until a specific condition satisfies. It is known as iteration also. There are three types of loops used in the C language. In this part of the tutorial, we are going to learn all the aspects of C loops..
I'm not sure why you are creating so many disposable objects (you can clean these up with yieldable methods) but you can create an extension method to clean up this pattern for you
public static class ToolsEx
{
public static IEnumerable<T> EnumerateAndDispose<X, T>(this X input,
Func<X, IEnumerable<T>> func)
where X : IDisposable
{
using (var mc = input)
foreach (var i in func(mc))
yield return i;
}
}
you can use it likes this...
var query = from x in new MyClass(0, 0, 2).EnumerateAndDispose(i => i)
from y in new MyClass(1, x, 3).EnumerateAndDispose(i => i)
select new
{
x,
y,
};
foreach (var i in query)
Console.WriteLine(i);
... output ...
{ x = 0, y = 0 }
{ x = 0, y = 1 }
{ x = 0, y = 2 }
Disposed: 1/0
{ x = 1, y = 0 }
{ x = 1, y = 1 }
{ x = 1, y = 2 }
Disposed: 1/1
Disposed: 0/0
Here is a pipeline example with Aggregate
...
var query = from x in new MyClass(0, 0, 2).EnumerateAndDispose(i => i)
let r = new MyClass(1, x, 3).EnumerateAndDispose(i => i)
.Aggregate(x, (a, i) => (a + i) * 2)
select new
{
x,
r,
};
... and the results ...
Disposed: 1/0
{ x = 0, r = 8 }
Disposed: 1/1
{ x = 1, r = 16 }
Disposed: 0/0
... test class for the example ...
public class MyClass : IEnumerable<int>, IDisposable
{
public MyClass(int set, int set2, int size)
{
this.Size = size;
this.Set = set;
this.Set2 = set2;
}
public IEnumerator<int> GetEnumerator()
{
foreach (var i in Enumerable.Range(0, this.Size))
yield return i;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public void Dispose()
{
Console.WriteLine("Disposed: {0}/{1}", this.Set, this.Set2);
}
public int Size { get; private set; }
public int Set { get; private set; }
public int Set2 { get; private set; }
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With