Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity limits of Linq queries

I'm a big fan of Linq, and I have been really enjoying the power of expression trees etc. But I have found that whenever I try to get too clever with my queries, I hit some kind of limitation in the framework: while the query can take a very short time to run on the database (as shown by performance analyzer), the results take ages to materialize. When that happens I know I've been too fancy, and I start breaking the query up into smaller, bite sized chunks - so I have a solution for that, though it might not always be the most optimal.

But I'd like to understand:

  • What is it that pushes the Linq framework over the edge in terms of materializing the query results?
  • Where can I read about the mechanism of materializing query results?
  • Is there a certain measurable complexity limit for Linq queries that should be avoided?
  • What design patterns are known to cause this problem, and what patterns can remedy it?

EDIT: As requested in comments, here's an example of a query that I measured to run on SQL Server in a few seconds, but took almost 2 minutes to materialize. I'm not going to try explaining all the stuff in context; it's here just so you can view the constructs and see an example of what I'm talking about:

Expression<Func<Staff, TeacherInfo>> teacherInfo =
    st => new TeacherInfo
    {
        ID = st.ID,
        Name = st.FirstName + " " + st.LastName,
        Email = st.Email,
        Phone = st.TelMobile,
    };

var step1 =
    currentReportCards.AsExpandable()
        .GroupJoin(db.ScholarReportCards,
                             current =>
                             new { current.ScholarID, current.AcademicTerm.AcademicYearID },
                             past => new { past.ScholarID, past.AcademicTerm.AcademicYearID },
                             (current, past) => new
                             {
                                 Current = current,
                                 PastCards =
                                 past.Where(
                                     rc =>
                                     rc.AcademicTerm.StartDate <
                                     current.AcademicTerm.StartDate &&
                                     rc.AcademicTerm.Grade == current.AcademicTerm.Grade &&
                                     rc.AcademicTerm.SchoolID == current.AcademicTerm.SchoolID)
                             });
// This materialization is what takes a long time:
var subjects = step1.SelectMany(x => from key in x.Current.Subjects
                            .Select(s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID })
                            .Union(x.PastCards.SelectMany(c => c.Subjects)
                                            .Select(
                                                s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID }))
         join cur in x.Current.Subjects on key equals
             new { cur.Subject.SubjectID, cur.Subject.SubjectCategoryID } into jcur
         from cur in jcur.DefaultIfEmpty()
         join past in x.PastCards.SelectMany(p => p.Subjects) on key equals
             new { past.Subject.SubjectID, past.Subject.SubjectCategoryID } into past
         select new
         {
             x.Current.ScholarID,
             IncludeInContactSection =
                 // ReSharper disable ConstantNullCoalescingCondition
                (bool?)cur.Subject.IncludeInContactSection ?? false,
             IncludeGrades = (bool?)cur.Subject.IncludeGrades ?? true,
             // ReSharper restore ConstantNullCoalescingCondition
             SubjectName =
                cur.Subject.Subject.Name ?? past.FirstOrDefault().Subject.Subject.Name,
             SubjectCategoryName = cur.Subject.SubjectCategory.Description,
             ClassInfo = (from ce in myDb.ClassEnrollments
                             .Where(
                                 ce =>
                                 ce.Class.SubjectID == cur.Subject.SubjectID
                                 && ce.ScholarID == x.Current.ScholarID)
                             .Where(enrollmentExpr)
                             .OrderByDescending(ce => ce.TerminationDate ?? DateTime.Today)
                     let teacher = ce.Class.Teacher
                     let secTeachers = ce.Class.SecondaryTeachers
                     select new
                     {
                         ce.Class.Nickname,
                         Primary = teacherInfo.Invoke(teacher),
                         Secondaries = secTeachers.AsQueryable().AsExpandable()
                            .Select(ti => teacherInfo.Invoke(ti))
                     })
                .FirstOrDefault(),
             Comments = cur.Comments
                 .Select(cc => new
                 {
                     Staff = cc.Staff.FirstName + " "
                                     + cc.Staff.LastName,
                     Comment = cc.CommentTemplate.Text ??
                                         cc.CommentFreeText
                 }),
             // ReSharper disable ConstantNullCoalescingCondition
             DisplayOrder = (byte?)cur.Subject.DisplayOrder ?? (byte)99,
             // ReSharper restore ConstantNullCoalescingCondition
             cur.Percentile,
             cur.Score,
             cur.Symbol,
             cur.MasteryLevel,
             PastScores = past.Select(p => new
                {
                    p.Score,
                    p.Symbol,
                    p.MasteryLevel,
                    p.ScholarReportCard
                     .AcademicTermID
                }),
             Assessments = cur.Assessments
                 .Select(a => new
                 {
                     a.ScholarAssessment.AssessmentID,
                     a.ScholarAssessment.Assessment.Description,
                     a.ScholarAssessment.Assessment.Type.Nickname,
                     a.ScholarAssessment.AssessmentDate,
                     a.ScoreDesc,
                     a.ScorePerc,
                     a.MasteryLevel,
                     a.ScholarAssessment.Assessment.Type.AssessmentFormat,
                     a.ScholarAssessment.PublishedStatus,
                     a.ScholarAssessment.FPScore,
                     a.ScholarAssessment.TotalScore,
                     a.ScholarAssessment.Assessment.Type.ScoreType,
                     a.ScholarAssessment.Assessment.Type.OverrideBelowLabel,
                     a.ScholarAssessment.Assessment.Type.OverrideApproachingLabel,
                     a.ScholarAssessment.Assessment.Type.OverrideMeetingLabel,
                     a.ScholarAssessment.Assessment.Type.OverrideExceedingLabel,
                 })
            })
        .ToList();
like image 309
Shaul Behr Avatar asked Mar 24 '14 08:03

Shaul Behr


1 Answers

Linq uses deferred execution for some tasks, for example while iterating through an IEnumerable<>, so what you call materialization includes some actual data fetching.

var reportCards = db.ScholarReportCards.Where(cr => ...); // this prepares the query 
foreach (var rc in reportCards) {} // this executes your query and calls the DB

I think that if you trace/time queries on your SQL server you may see some queries arriving during the "materialization" step. This problem may even be exacerbated by anti-patterns such as the "Select N+1" problem : for example it looks like you're not including the AcademicTerm objects in your request; if you don't resolving these will result in a select N+1, that is for every ScholarReportCard there will be a call to the DB to lazily resolve the AcademicTerm attached.

If we focus on the Linq to DB aspect, at least try not to :

  • select n+1: Include the related datatables you will need
  • select too much data: include only the columns you need in your selection (Include on the table you need)
like image 159
samy Avatar answered Sep 22 '22 02:09

samy