Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Entity Framework Selects all rows in memory when applying a Where filter?

There is one thing makes me confused. I think EF selects all rows (all records) in table.

Let me show you a sample.

public Category GetByID(int Id)
{
    return context.Categories.Find(Id);
}

there are a lot of records in table and when i check them with break point i can see all the records not only the Id numbered one. What if there are 10k records in table? I test this. I copied all record manually in database and i make 30k records.

An expression like this,

IEnumerable<Category> categories = categoryRepository
      .Where(x => x.Published == true)
      .ToList();

I saw 30k records with break point. But at least 10k Published False in table.

Is Entity framework firstly fetches all of the records to memory and after filters them?

like image 909
Bulut Paradise Avatar asked May 27 '26 06:05

Bulut Paradise


1 Answers

TL;DR

It's likely your categoryRepository has broken EF's IQueryable<> expression tree, and is materializing the entire Category table in order to run the .Where predicate. See the examples below.

More Detail

The short answer is no, provided that Entity Framework is able to parse the IQueryable<> expression (which includes the .Where predicates you specify) it will convert the associated expression tree into native Sql using the appropriate query provider for the RDBMS you are targetting, thus allowing all the benefits of Sql, e.g. use of indexes.

As per my comment, one of the common reasons why EF would not do this is if the IQueryable mechanism has been tampered with, for instance, if your Repository pattern implementation uses the IEnumerable<T> overload of the Where predicate and not the IQueryable overload.

As a result, EF has no other option but to fetch the table and execute every row against your predicate function to determine whether the row matches your predicate or not.

As an aside, there is some debate whether there is merit in wrapping a DbContext in a Repository and / or Unit Of Work wrapper, as a DbContext is Transactional, performs caching, can be mocked during unit testing, and now supports a wide range of databases.

Examples of where materialization happens and how this affects performance

(The point at which the Sql query is actually executed is often referred to as materialization)

I've excluded the OP's repository - i.e. we're using the DbContext directly.

Best:

var miniFoos = myDbContext.MyFooSet
  .Where(f => f.SomeProperty = "foo")
  .Select(f => new {...})
  .ToList();   

This is good, because, we've applied both the predicate and a projection of just the columns we need in SQL, before we materialize the data into a List (of an anonymous type)

OK:

var foos = myDbContext.MyFooSet
  .Where(f => f.SomeProperty = "foo")
  .ToList() // Or .AsEnumerable(), or other materialization methods
  .Select(f => new {...}); // Subset of fields

This isn't ideal, because although we've applied the .Where clause before we materialize, we're returning the full columns in of Foo table, just to discard unnecessary columns. This means unnecessary I/O, and also, Sql might have been able to use just an index to perform the query.

Bad - Never do this

var foos = myDbContext.MyFooSet
  .AsEnumerable() // (or `ToList()`, same problem)
  .Where(f => f.SomeProperty = "foo")
  .Select(f => new {...}); // Subset of fields

This seems to be what the OP is experiencing - since the table is materialized BEFORE any .Where filtering takes place, the whole table IS retrieved to memory and filtering happens with Linq to Objects, instead.

This problem can also happen when applying custom .Where predicate builders which don't use Expressions, or which use IEnumerable<T> instead of IQueryable<T> - IEnumerable has no associated expression tree and can't be parsed to Sql.

like image 185
StuartLC Avatar answered May 30 '26 04:05

StuartLC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!