Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Big O of linq .where?

Tags:

c#

big-o

linq

where

I am doing some comparisons about where to filter out items from a list. I am unsure of doing it directly which would be O(n), or using .Where(). I made a simple example to test .Where() on a simple data set. There are n=100 items, and when I run the debugger on the line in the function BigO() it goes exactly 100 times making me think that .Where() is also O(n). What I couldn't figure out was where the data was being stored during the operation and I wasn't sure if that was adding any increased complexity.

Am I missing something, or is .Where() O(n)?

public class ListerFactory
{

 public class Lister
 {
  bool includeItem { get; set; }
 }

 List<Lister> someList { get; set; }

 public ListerFactory()
 {
  someList = new List<Lister>();
  BuildLister();
 }    

 public void BuildLister()
 {
  for(int i = 0; i < 100; i++)
  {
   var inc = new Lister();
   inc.includeItem = i % 2;
   someList.Add(inc);
  }
  BigO();
 }

 public void BigO()
 {
  someList = someList.Where(thisList => thisList.includeItem == true).ToList();
 }
}
like image 893
Travis J Avatar asked Mar 25 '12 22:03

Travis J


1 Answers

Where() is O(1); it doesn't actually do any work.

Looping through the collection returned by Where() is O(n). ..

The O(n) that you're seeing is the result of ToList(), which is O(n).
If you pass a Where() query to an O(n2) algorithm, you will see the callback execute n2 times. (assuming the algorithm doesn't cache anywhere)

This is called deferred execution.

This is true about most if not all LINQ providers; it wouldn't make sense for a LINQ provider to eagerly execute all calls.


In the case of LINQ to objects, this assumes that the source collection's enumerator is O(n).
If you're using some strange collection which iterates in worse than O(n) (in other words, if its MoveNext() is worse than O(1)), Where() will be bounded by that.

To be more precise, the time complexity of enumerating a Where() query is the same as the time complexity of the original enumeration.

Similarly, I'm assuming that the callback is O(1).
If it isn't, you'll need to multiply the complexity of the callback by the complexity of the original enumeration.

like image 127
SLaks Avatar answered Oct 12 '22 11:10

SLaks