Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can data be streamed from a LINQ to Entities query?

I'm wondering how one would go about streaming data from SQL server using EF6.

Assume having these classes

  • PersonRepository
  • EFPerson (EF model)
  • DomainPerson (Domain model)
  • PersonUsingClass

Assume that PersonUsingClass depends on getting a bunch of DomainPersons. Assume that business rules dictate that EFPersons aren't allowed to leave the PersonRepository.

Normally I'd have a repository method looking like this:

    public IEnumerable<DomainPerson> GetPeople()
    {
        using (var db = new efContext())
        {
            IQueryable efPeople = db.Person.Where(someCriteria);

            foreach (var person in efPeople)
            {
                yield return person.ToDomainPerson();
            }
        }
    }

With the code I have here, everything would be loaded into memory upon doing the foreach. I could achieve the streaming by returning the IQueryable to the PersonUsingClass, but that exposes the EF models to that class, which is an unwanted scenario.

Is it really impossible to hide away the EF models why streaming data at the same time? Or is there something I don't know?

like image 753
Treeline Avatar asked Jul 25 '19 14:07

Treeline


2 Answers

The method you have created iterates over an IQueryable<> object that's created by EF.

The IQueryable<> variable has deferred execution, therefore internally, EF will only make the call to the database when the IQueryable<> is iterated over (i.e. when .MoveNext() is first called).

Additionally, if you have ever hand-rolled a database call using SqlDataReader, you'll see that it is possible to .Read() results of a query one-by-one, you don't need to load all records into memory. It's likely that the records are being streamed in this way by EF (this is an assumption on my part, it may depend on your specific EF setup).

Your method is returning an IEnumerable<> object which is also subject to deferred exeution. Creating an instance of this by calling GetPeople() will not lead to a database call.

When the result of your method is iterated over, you're then triggering the iteration over the internal IQueryable<> object and transforming the results one by one.

Therefore:

No records are being loaded into memory in that method (unless EF is doing some caching internally). If you iterate over the result of that method then you are iterating over each record one by one.

If you call .ToList() or .ToArray() on the result of that method then records will be loaded into memory.

like image 68
Oliver Avatar answered Oct 28 '22 06:10

Oliver


Entity Framework queries used to be buffering and could be made streaming by an AsStreaming extension method. But streaming has long been the default and the extension method still exists but is obsolete now (in EF6). That's one.

But don't forget EF's change tracker. By default, EF caches all entities it materializes in its change tracker, which is an identity cache. Therefore, even though the query is streaming, in order to prevent memory consumption you have to prevent EF from tracking entities. And that's exactly what's missing in your code.

Each iteration of the foreach loop attaches one Person instance to the change tracker.

Caching the entities can be prevented in two ways.

  1. Simply use db.Person.AsNoTracking().
  2. Project immediately. Projection creates objects of types that EF doesn't track.

The second method would look like:

var people = db.Person.Where(someCriteria).Select(p => p.ToDomainPerson());

But of course ToDomainPerson() can't be translated into SQL. Instead you should do something like:

db.Person.Where(someCriteria).Select(p => new DomainPerson
{
    Name = p.Name,
    ...
}
);

Or, better, use AutoMapper's ProjectTo method, which keeps your code just as DRY as this ToDomainPerson method.

The advantage of projecting immediately is that you only pull the required fields from the database and that no lazy loading will be triggered afterwards. Lazy loading can be a source of serialization problems or exceptions because the context is disposed when lazy loading is triggered.

like image 40
Gert Arnold Avatar answered Oct 28 '22 06:10

Gert Arnold