In LINQ Where
is a streaming operator. Where-as OrderByDescending
is a non-streaming operator. AFAIK, a streaming operator only gathers the next item that is necessary. A non-streaming operator evaluates the entire data stream at once.
I fail to see the relevance of defining a Streaming Operator. To me, it is redundant with Deferred Execution. Take the example where I have written a custom extension and consumed it using the where operator and and orderby.
public static class ExtensionStuff
{
public static IEnumerable<int> Where(this IEnumerable<int> sequence, Func<int, bool> predicate)
{
foreach (int i in sequence)
{
if (predicate(i))
{
yield return i;
}
}
}
}
public static void Main()
{
TestLinq3();
}
private static void TestLinq3()
{
int[] items = { 1, 2, 3,4 };
var selected = items.Where(i => i < 3)
.OrderByDescending(i => i);
Write(selected);
}
private static void Write(IEnumerable<int> selected)
{
foreach(var i in selected)
Console.WriteLine(i);
}
In either case, Where
needs to evaluate each element in order to determine which elements meet the condition. The fact that it yields seems to only become relevant because the operator gains deferred execution.
So, what is the importance of Streaming Operators?
There are two aspects: speed and memory.
The speed aspect becomes more apparent when you use a method like .Take()
to only consume a portion of the original result set.
// Consumes ten elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.Take(5)
.ToList();
// Consumes one million elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.OrderByDescending(i => i)
.Take(5)
.ToList();
Because the first example uses only streaming operators before the call to Take
, you only end up yielding values 1 through 10 before Take
stops evaluating. Furthermore, only one value is loaded into memory at a time, so you have a very small memory footprint.
In the second example, OrderByDescending
is not streaming, so the moment Take pulls the first item, the entire result that's passed through the Where
filter has to be placed in memory for sorting. This could take a long time and produce a big memory footprint.
Even if you weren't using Take
, the memory issue can be important. For example:
// Puts half a million elements in memory, sorts, then outputs them.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.OrderByDescending(i => i);
foreach(var number in numbers) Console.WriteLine(number);
// Puts one element in memory at a time.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0);
foreach(var number in numbers) Console.WriteLine(number);
The fact that it yields seems to only become relevant because the operator gains deferred execution.
So, what is the importance of Streaming Operators?
I.e. you could not process infinite sequences with buffering / non-streaming extension methods - while you can "run" such a sequence (until you abort) just fine using only streaming extension methods.
Take for example this method:
public IEnumerable<int> GetNumbers(int start)
{
int num = start;
while(true)
{
yield return num;
num++;
}
}
You can use Where
just fine:
foreach (var num in GetNumbers(0).Where(x => x % 2 == 0))
{
Console.WriteLine(num);
}
OrderBy()
would not work in this case since it would have to exhaustively enumerate the results before emitting a single number.
Just to be explicit; in the case you mentioned there's no advantage to the fact that where streams, since the orderby sucks the whole thing in anyway. There are however times where the advantages of streaming are used (other answers/comments have given examples), so all LINQ operators stream to the best of their ability. Orderby streams as much as it can, which happens to be not very much. Where streams very effectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With