Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use infinite range and operate over it?

Tags:

c#

linq

Enumerable.Range(0, int.MaxValue)
          .Select(n => Math.Pow(n, 2))
          .Where(squared => squared % 2 != 0)
          .TakeWhile(squared => squared < 10000).Sum()

Will this code iterate over all of the integer values from 0 to max-range or just through the integer values to satisfy the take-while, where, and select operators? Can somebody clarify?

EDIT: My first try to make sure it works as expected was dumb one. I revoke it :)

like image 816
suhair Avatar asked Mar 26 '11 17:03

suhair


2 Answers

int.MaxValue + 5 overflows to be a negative number. Try it yourself:

unchecked
{
    int count = int.MaxValue + 5;
    Console.WriteLine(count); // Prints -2147483644
}

The second argument for Enumerable.Range has to be non-negative - hence the exception.

You can certainly use infinite sequences in LINQ though. Here's an example of such a sequence:

public IEnumerable<int> InfiniteCounter()
{
    int counter = 0;
    while (true)
    {
        unchecked
        {
            yield return counter;
            counter++;
        }
    }
}

That will overflow as well, of course, but it'll keep going...

Note that some LINQ operators (e.g. Reverse) need to read all the data before they can yield their first result. Others (like Select) can just keep streaming results as they read them from the input. See my Edulinq blog posts for details of the behaviour of each operator (in LINQ to Objects).

like image 141
Jon Skeet Avatar answered Sep 20 '22 03:09

Jon Skeet


The way to solve these sort of questions in general, is to think about what's going on in steps.

Linq turns the linq code into something that'll be executed by query provider. This could be something like producing SQL code, or all manner of things. In the case of linq-to-objects, it produces some equivalent .NET code. Thinking about what that .NET code will be lets us reason about what will happen.*

With your code you have:

Enumerable.Range(0, int.MaxValue)
                         .Select(n => Math.Pow(n, 2))
                         .Where(squared => squared % 2 != 0)
                         .TakeWhile(squared => squared < 10000).Sum()

Enumerable.Range is slightly more complicated than:

for(int i = start; i != start + count; ++i)
  yield return i;

...but that's close enough for argument's sake.

Select is close enough to:

foreach(T item in source)
  yield return func(item);

Where is close enough to:

foreach(T item in source)
  if(func(item))
    yield return item;

TakeWhile is close enough to:

foreach(T item in source)
  if(func(item))
    yield return item;
  else
    yield break;

Sum is close enough to:

T tmp = 0;//must be numeric type
foreach(T x in source)
  tmp += x;
return tmp;

This simplifies a few optimisations and so on, but is close enough to reason with. Taking each of these in turn, your code is equivalent to:

double ret = 0; // part of equivalent of sum
for(int i = 0; i != int.MaxValue; ++i) // equivalent of Range
{
  double j = Math.Pow(i, 2);  // equivalent of Select(n => Math.Pow(n, 2))
  if(j % 2 != 0) //equivalent of Where(squared => squared %2 != 0)
  {
    if(j < 10000) //equivalent of TakeWhile(squared => squared < 10000)
    {
       ret += j; //equaivalent of Sum()
    }
    else //TakeWhile stopping further iteration
    {
      break;
    }
  }
}
return ret; //end of equivalent of Sum()

Now, in some ways the code above is simpler, and in some ways it's more complicated. The whole point of using LINQ is that in many ways its simpler. Still, to answer your question "Will this code iterate over all of the integer values from 0 to max-range or just through the integer values to satisfy the take-while, where, and select operators?" we can look at the above and see that those that don't satisfy the where are iterated through to find that they don't satisfy the where, but no more work is done with them, and once the TakeWhile is satisfied, all further work is stopped (the break in my non-LINQ re-write).

Of course it's only the TakeWhile() in this case that means the call will return in a reasonable length of time, but we also need to think briefly about the others to make sure they yield as they go. Consider the following variant of your code:

Enumerable.Range(0, int.MaxValue)
  .Select(n => Math.Pow(n, 2))
  .Where(squared => squared % 2 != 0)
  .ToList()
  .TakeWhile(squared => squared < 10000).Sum()

Theoretically, this will give exactly the same answer, but it will take far longer and far more memory to do so (probably enough to cause an out of memory exception). The equivalent non-linq code here though is:

List<double> tmpList = new List<double>(); // part of ToList equivalent
for(int i = 0; i != int.MaxValue; ++i) // equivalent of Range
{
  double j = Math.Pow(i, 2);  // equivalent of Select(n => Math.Pow(n, 2))
  if(j % 2 != 0) //equivalent of Where(squared => squared %2 != 0)
  {
    tmpList.Add(j);//part of equivalent to ToList()
  }
}
double ret = 0; // part of equivalent of sum
foreach(double k in tmpList)
{
  if(k < 10000) //equivalent of TakeWhile(squared => squared < 10000)
  {
     ret += k; //equaivalent of Sum()
  }
  else //TakeWhile stopping further iteration
  {
    break;
  }
}
return ret; //end of equivalent of Sum()

Here we can see how adding ToList() to the Linq query vastly affects the query so that every item produced by the Range() call must be dealt with. Methods like ToList() and ToArray() break up the chaining so that non-linq equivalents no longer fit "inside" each other and none can therefore stop the operation of those that come before. (Sum() is another example, but since it's after your TakeWhile() in your example, that isn't an issue).

Another thing that would make it go through every iteration of the range is if you had While(x => false) because it would never actually perform the test in TakeWhile.

*Though there may be further optimisations, esp in the case of SQL code and also while conceptually e.g. Count() is equivalent to:

int c = 0;
foreach(item in src)
  ++c;
return c;

That this will be turned into a call to the Count property of an ICollection or the Length property of an array means the O(n) above is replaced by an O(1) (for most ICollection implementations) call, which is a massive gain for large sequences.

like image 38
Jon Hanna Avatar answered Sep 21 '22 03:09

Jon Hanna