Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List queries 20 times faster than IQueryable?

Here is a test that i have setup this evening. It was made to prove something different, but the outcome was not quite as i expected.

I'm running a test with 10000 random queries on an IQueryable and while testing i found out that if i do the same on a List, my test is 20 times faster.

See below. My CarBrandManager.GetList originally returns an IQueryable, but now i first issue a ToList(), and then it's way faster.

Can anyone tell me something about why i see this big difference?

var sw = new Stopwatch();
sw.Start();

int queries = 10000;

//IQueryable<Model.CarBrand> carBrands = CarBrandManager.GetList(context);
List<Model.CarBrand> carBrands = CarBrandManager.GetList(context).ToList();

Random random = new Random();
int randomChar = 65;

for (int i = 0; i < queries; i++)
{
    randomChar = random.Next(65, 90);
    Model.CarBrand carBrand = carBrands.Where(x => x.Name.StartsWith(((char)randomChar).ToString())).FirstOrDefault();
}

sw.Stop();
lblStopWatch.Text = String.Format("Queries: {0} Elapsed ticks: {1}", queries, sw.ElapsedTicks);
like image 830
Tys Avatar asked Oct 25 '12 22:10

Tys


1 Answers

There are potentially two issues at play here. First: It's not obvious what type of collection is returned from GetList(context), apart from the knowledge that it implements IQueryable. That means when you evaluate the result, it could very well be creating an SQL query, sending that query to a database, and materializing the result into objects. Or it could be parsing an XML file. Or downloading an RSS feed or invoking an OData endpoint on the internet. These would obviously take more time than simply filtering a short list in memory. (After all, how many car brands can there really be?)

But let's suppose that the implementation it returns is actually a List, and therefore the only difference you're testing is whether it's cast as an IEnumerable or as an IQueryable. Compare the method signatures on the Enumerable class's extension methods with those on Queryable. When you treat the list as an IQueryable, you are passing in Expressions, which need to be evaluated, rather than just Funcs which can be run directly.

When you're using a custom LINQ provider like Entity Framework, this gives the framework the ability to evaluate the actual expression trees and produce a SQL query and materialization plan from them. However, LINQ to Objects just wants to evaluate the lambda expressions in-memory, so it has to either use reflection or compile the expressions into Funcs, both of which have a large performance hit associated with them.

You may be tempted to just call .ToList() or .AsEnumerable() on the result set to force it to use Funcs, but from an information hiding perspective this would be a mistake. You would be assuming that you know that the data returned from the GetList(context) method is some kind of in-memory object. That may be the case at the moment, or it may not. Regardless, it's not part of the contract that is defined for the GetList(context) method, and therefore you cannot assume it will always be that way. You have to assume that the type you get back could very well be something that you can query. And even though there are probably only a dozen car brands to search through at the moment, it's possible that some day there will be thousands (I'm talking in terms of programming practice here, not necessarily saying this is the case with the car industry). So you shouldn't assume that it will always be faster to download the entire list of cars and filter them in memory, even if that happens to be the case right now.

If the CarBrandManager.GetList(context) might return an object backed by a custom LINQ provider (like an Entity Framework collection), then you probably want to leave the data cast as an IQueryable: even though your benchmark shows it being 20 times faster to use a list, that difference is so small that no user is ever going to be able to tell the difference. You may one day see performance gains of several orders of magnitude by calling .Where().Take().Skip() and only loading the data you really need from the data store, whereas you'd end up loading the whole table into your system's memory if you call .ToList() on right off the bat.

However, if you know that CarBrandManager.GetList(context) will always return an in-memory list (as the name implies), it should be changed to return an IEnumerable<Model.CarBrand> instead of an IQueryable<Model.CarBrand>. Or, if you're on .NET 4.5, perhaps an IReadOnlyList<Model.CarBrand> or IReadOnlyCollection<Model.CarBrand>, depending on what contract you're willing to force your CarManager to abide by.

like image 171
StriplingWarrior Avatar answered Oct 15 '22 04:10

StriplingWarrior