Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a streaming operator differ from deferred execution?

Tags:

c#

linq

In LINQ Where is a streaming operator. Where-as OrderByDescending is a non-streaming operator. AFAIK, a streaming operator only gathers the next item that is necessary. A non-streaming operator evaluates the entire data stream at once.

I fail to see the relevance of defining a Streaming Operator. To me, it is redundant with Deferred Execution. Take the example where I have written a custom extension and consumed it using the where operator and and orderby.

public static class ExtensionStuff
{
    public static IEnumerable<int> Where(this IEnumerable<int> sequence, Func<int, bool> predicate)
    {
        foreach (int i in sequence)
        {
            if (predicate(i))
            {
                yield return i;
            }
        }
    }
}

    public static void Main()
    {
        TestLinq3();
    }

    private static void TestLinq3()
    {
        int[] items = { 1, 2, 3,4 };

        var selected = items.Where(i => i < 3)
                            .OrderByDescending(i => i);

        Write(selected);
    }



    private static void Write(IEnumerable<int> selected)
    {
        foreach(var i in selected)
            Console.WriteLine(i);
    }

In either case, Where needs to evaluate each element in order to determine which elements meet the condition. The fact that it yields seems to only become relevant because the operator gains deferred execution.

So, what is the importance of Streaming Operators?

like image 741
P.Brian.Mackey Avatar asked Apr 05 '12 21:04

P.Brian.Mackey


3 Answers

There are two aspects: speed and memory.

The speed aspect becomes more apparent when you use a method like .Take() to only consume a portion of the original result set.

// Consumes ten elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .Take(5)
    .ToList();

// Consumes one million elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .OrderByDescending(i => i)
    .Take(5)
    .ToList();

Because the first example uses only streaming operators before the call to Take, you only end up yielding values 1 through 10 before Take stops evaluating. Furthermore, only one value is loaded into memory at a time, so you have a very small memory footprint.

In the second example, OrderByDescending is not streaming, so the moment Take pulls the first item, the entire result that's passed through the Where filter has to be placed in memory for sorting. This could take a long time and produce a big memory footprint.

Even if you weren't using Take, the memory issue can be important. For example:

// Puts half a million elements in memory, sorts, then outputs them.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .OrderByDescending(i => i);
foreach(var number in numbers) Console.WriteLine(number);

// Puts one element in memory at a time.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0);
foreach(var number in numbers) Console.WriteLine(number);
like image 168
StriplingWarrior Avatar answered Oct 31 '22 17:10

StriplingWarrior


The fact that it yields seems to only become relevant because the operator gains deferred execution.

So, what is the importance of Streaming Operators?

I.e. you could not process infinite sequences with buffering / non-streaming extension methods - while you can "run" such a sequence (until you abort) just fine using only streaming extension methods.

Take for example this method:

public IEnumerable<int> GetNumbers(int start)
{
    int num = start;

    while(true)
    {
        yield return num;
        num++;
    }
}

You can use Where just fine:

foreach (var num in GetNumbers(0).Where(x => x % 2 == 0))
{
    Console.WriteLine(num);
}

OrderBy() would not work in this case since it would have to exhaustively enumerate the results before emitting a single number.

like image 20
BrokenGlass Avatar answered Oct 31 '22 18:10

BrokenGlass


Just to be explicit; in the case you mentioned there's no advantage to the fact that where streams, since the orderby sucks the whole thing in anyway. There are however times where the advantages of streaming are used (other answers/comments have given examples), so all LINQ operators stream to the best of their ability. Orderby streams as much as it can, which happens to be not very much. Where streams very effectively.

like image 20
Servy Avatar answered Oct 31 '22 19:10

Servy