Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq slowness materializing complex queries

I have often found that if I have too many joins in a Linq query (whether using Entity Framework or NHibernate) and/or the shape of the resulting anonymous class is too complex, Linq takes a very long time to materialize the result set into objects.

This is a generic question, but here's a specific example using NHibernate:

var libraryBookIdsWithShelfAndBookTagQuery = (from shelf in session.Query<Shelf>()
    join sbttref in session.Query<ShelfBookTagTypeCrossReference>() on
         shelf.ShelfId equals sbttref.ShelfId
    join bookTag in session.Query<BookTag>() on
         sbttref.BookTagTypeId equals (byte)bookTag.BookTagType
    join btbref in session.Query<BookTagBookCrossReference>() on
         bookTag.BookTagId equals btbref.BookTagId
    join book in session.Query<Book>() on
         btbref.BookId equals book.BookId
    join libraryBook in session.Query<LibraryBook>() on
         book.BookId equals libraryBook.BookId
    join library in session.Query<LibraryCredential>() on
         libraryBook.LibraryCredentialId equals library.LibraryCredentialId
    join lcsg in session
         .Query<LibraryCredentialSalesforceGroupCrossReference>()
          on library.LibraryCredentialId equals lcsg.LibraryCredentialId
    join userGroup in session.Query<UserGroup>() on
         lcsg.UserGroupOrganizationId equals userGroup.UserGroupOrganizationId
    where
         shelf.ShelfId == shelfId &&
         userGroup.UserGroupId == userGroupId &&
         !book.IsDeleted &&
         book.IsDrm != null &&
         book.BookFormatTypeId != null
    select new
    {
        Book = book,
        LibraryBook = libraryBook,
        BookTag = bookTag
    });

// add a couple of where clauses, then...
var result = libraryBookIdsWithShelfAndBookTagQuery.ToList();

I know it's not the query execution, because I put a sniffer on the database and I can see that the query is taking 0ms, yet the code is taking about a second to execute that query and bring back all of 11 records.

So yeah, this is an overly complex query, having 8 joins between 9 tables, and I could probably restructure it into several smaller queries. Or I could turn it into a stored procedure - but would that help?

What I'm trying to understand is, where is that red line crossed between a query that is performant and one that starts to struggle with materialization? What's going on under the hood? And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?

EDIT: in response to a request in the comments, here's the SQL emitted:

SELECT DISTINCT book4_.bookid                 AS BookId12_0_, 
                libraryboo5_.librarybookid    AS LibraryB1_35_1_, 
                booktag2_.booktagid           AS BookTagId15_2_, 
                book4_.title                  AS Title12_0_, 
                book4_.isbn                   AS ISBN12_0_, 
                book4_.publicationdate        AS Publicat4_12_0_, 
                book4_.classificationtypeid   AS Classifi5_12_0_, 
                book4_.synopsis               AS Synopsis12_0_, 
                book4_.thumbnailurl           AS Thumbnai7_12_0_, 
                book4_.retinathumbnailurl     AS RetinaTh8_12_0_, 
                book4_.totalpages             AS TotalPages12_0_, 
                book4_.lastpage               AS LastPage12_0_, 
                book4_.lastpagelocation       AS LastPag11_12_0_, 
                book4_.lexilerating           AS LexileR12_12_0_, 
                book4_.lastpageposition       AS LastPag13_12_0_, 
                book4_.hidden                 AS Hidden12_0_, 
                book4_.teacherhidden          AS Teacher15_12_0_, 
                book4_.modifieddatetime       AS Modifie16_12_0_, 
                book4_.isdeleted              AS IsDeleted12_0_, 
                book4_.importedwithlexile     AS Importe18_12_0_, 
                book4_.bookformattypeid       AS BookFor19_12_0_, 
                book4_.isdrm                  AS IsDrm12_0_, 
                book4_.lightsailready         AS LightSa21_12_0_, 
                libraryboo5_.bookid           AS BookId35_1_, 
                libraryboo5_.libraryid        AS LibraryId35_1_, 
                libraryboo5_.externalid       AS ExternalId35_1_, 
                libraryboo5_.totalcopies      AS TotalCop5_35_1_, 
                libraryboo5_.availablecopies  AS Availabl6_35_1_, 
                libraryboo5_.statuschangedate AS StatusCh7_35_1_, 
                booktag2_.booktagtypeid       AS BookTagT2_15_2_, 
                booktag2_.booktagvalue        AS BookTagV3_15_2_ 
FROM   shelf shelf0_, 
       shelfbooktagtypecrossreference shelfbookt1_, 
       booktag booktag2_, 
       booktagbookcrossreference booktagboo3_, 
       book book4_, 
       librarybook libraryboo5_, 
       library librarycre6_, 
       librarycredentialsalesforcegroupcrossreference librarycre7_, 
       usergroup usergroup8_ 
WHERE  shelfbookt1_.shelfid = shelf0_.shelfid 
       AND booktag2_.booktagtypeid = shelfbookt1_.booktagtypeid 
       AND booktagboo3_.booktagid = booktag2_.booktagid 
       AND book4_.bookid = booktagboo3_.bookid 
       AND libraryboo5_.bookid = book4_.bookid 
       AND librarycre6_.libraryid = libraryboo5_.libraryid 
       AND librarycre7_.librarycredentialid = librarycre6_.libraryid 
       AND usergroup8_.usergrouporganizationid = 
           librarycre7_.usergrouporganizationid 
       AND shelf0_.shelfid = @p0 
       AND usergroup8_.usergroupid = @p1 
       AND NOT ( book4_.isdeleted = 1 ) 
       AND ( book4_.isdrm IS NOT NULL ) 
       AND ( book4_.bookformattypeid IS NOT NULL ) 
       AND book4_.lightsailready = 1 

EDIT 2: Here's the performance analysis from ANTS Performance Profiler:

Performance analysis by ANTS

like image 589
Shaul Behr Avatar asked Jul 30 '15 13:07

Shaul Behr


2 Answers

It is often database "good" practice to place lots of joins or super common joins into views. ORMs don't let you ignore these facts nor do they supplement the decades of time spent fine tuning databases to do these kinds of things efficiently. Refactor those joins into a singular view or a couple views if that'd make more sense in the greater perspective of your application.

NHibernate should be optimizing the query down and reducing the data so that .Net only has to mess with the important parts. However, if those domain objects are just naturally large, that's still a lot of data. Also, if it's a really large result set in terms of rows returned, that's a lot of objects getting instantiated even if the DB is able to return the set quickly. Refactoring this query into a view that only returns the data you actually need would also reduce object instantiation overhead.

Another thought would be to not do a .ToList(). Return the enumerable and let your code lazily consume the data.

like image 178
Bigsby Avatar answered Sep 17 '22 20:09

Bigsby


According to profiling information, the CreateQuery takes 45% of the total execution time. However as you mentioned the query took 0ms when you executed directly. But this alone is not enough to say there is a performance problem because,

  1. You are running the query with the profiler which has significant impact on execution time.
  2. When you use a profiler, it will affect every code is being profiled but not the sql execution time (because it happens in the SQL server), so you can see everything else is slower compared to SQL statement.

so ideal scenario is to measure how long it takes to execute entire code block, measure time for SQL query and calculate times, and if you do that you will probably end up with different values.

However, I'm not saying that the the NH Linq to SQL implementation is optimized for any query you come up with, but there are other ways in NHibernate to deal with those situations such as QueryOverAPI, CriteriaQueries, HQL and finally SQL.

  1. Where is that red line crossed between a query that is performant and one that starts to struggle with materialization. What's going on under the hood?

This one is pretty hard question and without having detail knowledge of NHibernate Linq to SQL provider it's hard to provide a accurate answer. You can always try different mechanisms provided and see which one is the best for given scenario.

  1. And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?

Yes, using a SP would help things to work pretty fast, but using SP would add more maintenance problems to your code base.

like image 25
Low Flying Pelican Avatar answered Sep 17 '22 20:09

Low Flying Pelican