I have often found that if I have too many joins in a Linq query (whether using Entity Framework or NHibernate) and/or the shape of the resulting anonymous class is too complex, Linq takes a very long time to materialize the result set into objects.
This is a generic question, but here's a specific example using NHibernate:
var libraryBookIdsWithShelfAndBookTagQuery = (from shelf in session.Query<Shelf>()
join sbttref in session.Query<ShelfBookTagTypeCrossReference>() on
shelf.ShelfId equals sbttref.ShelfId
join bookTag in session.Query<BookTag>() on
sbttref.BookTagTypeId equals (byte)bookTag.BookTagType
join btbref in session.Query<BookTagBookCrossReference>() on
bookTag.BookTagId equals btbref.BookTagId
join book in session.Query<Book>() on
btbref.BookId equals book.BookId
join libraryBook in session.Query<LibraryBook>() on
book.BookId equals libraryBook.BookId
join library in session.Query<LibraryCredential>() on
libraryBook.LibraryCredentialId equals library.LibraryCredentialId
join lcsg in session
.Query<LibraryCredentialSalesforceGroupCrossReference>()
on library.LibraryCredentialId equals lcsg.LibraryCredentialId
join userGroup in session.Query<UserGroup>() on
lcsg.UserGroupOrganizationId equals userGroup.UserGroupOrganizationId
where
shelf.ShelfId == shelfId &&
userGroup.UserGroupId == userGroupId &&
!book.IsDeleted &&
book.IsDrm != null &&
book.BookFormatTypeId != null
select new
{
Book = book,
LibraryBook = libraryBook,
BookTag = bookTag
});
// add a couple of where clauses, then...
var result = libraryBookIdsWithShelfAndBookTagQuery.ToList();
I know it's not the query execution, because I put a sniffer on the database and I can see that the query is taking 0ms, yet the code is taking about a second to execute that query and bring back all of 11 records.
So yeah, this is an overly complex query, having 8 joins between 9 tables, and I could probably restructure it into several smaller queries. Or I could turn it into a stored procedure - but would that help?
What I'm trying to understand is, where is that red line crossed between a query that is performant and one that starts to struggle with materialization? What's going on under the hood? And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?
EDIT: in response to a request in the comments, here's the SQL emitted:
SELECT DISTINCT book4_.bookid AS BookId12_0_,
libraryboo5_.librarybookid AS LibraryB1_35_1_,
booktag2_.booktagid AS BookTagId15_2_,
book4_.title AS Title12_0_,
book4_.isbn AS ISBN12_0_,
book4_.publicationdate AS Publicat4_12_0_,
book4_.classificationtypeid AS Classifi5_12_0_,
book4_.synopsis AS Synopsis12_0_,
book4_.thumbnailurl AS Thumbnai7_12_0_,
book4_.retinathumbnailurl AS RetinaTh8_12_0_,
book4_.totalpages AS TotalPages12_0_,
book4_.lastpage AS LastPage12_0_,
book4_.lastpagelocation AS LastPag11_12_0_,
book4_.lexilerating AS LexileR12_12_0_,
book4_.lastpageposition AS LastPag13_12_0_,
book4_.hidden AS Hidden12_0_,
book4_.teacherhidden AS Teacher15_12_0_,
book4_.modifieddatetime AS Modifie16_12_0_,
book4_.isdeleted AS IsDeleted12_0_,
book4_.importedwithlexile AS Importe18_12_0_,
book4_.bookformattypeid AS BookFor19_12_0_,
book4_.isdrm AS IsDrm12_0_,
book4_.lightsailready AS LightSa21_12_0_,
libraryboo5_.bookid AS BookId35_1_,
libraryboo5_.libraryid AS LibraryId35_1_,
libraryboo5_.externalid AS ExternalId35_1_,
libraryboo5_.totalcopies AS TotalCop5_35_1_,
libraryboo5_.availablecopies AS Availabl6_35_1_,
libraryboo5_.statuschangedate AS StatusCh7_35_1_,
booktag2_.booktagtypeid AS BookTagT2_15_2_,
booktag2_.booktagvalue AS BookTagV3_15_2_
FROM shelf shelf0_,
shelfbooktagtypecrossreference shelfbookt1_,
booktag booktag2_,
booktagbookcrossreference booktagboo3_,
book book4_,
librarybook libraryboo5_,
library librarycre6_,
librarycredentialsalesforcegroupcrossreference librarycre7_,
usergroup usergroup8_
WHERE shelfbookt1_.shelfid = shelf0_.shelfid
AND booktag2_.booktagtypeid = shelfbookt1_.booktagtypeid
AND booktagboo3_.booktagid = booktag2_.booktagid
AND book4_.bookid = booktagboo3_.bookid
AND libraryboo5_.bookid = book4_.bookid
AND librarycre6_.libraryid = libraryboo5_.libraryid
AND librarycre7_.librarycredentialid = librarycre6_.libraryid
AND usergroup8_.usergrouporganizationid =
librarycre7_.usergrouporganizationid
AND shelf0_.shelfid = @p0
AND usergroup8_.usergroupid = @p1
AND NOT ( book4_.isdeleted = 1 )
AND ( book4_.isdrm IS NOT NULL )
AND ( book4_.bookformattypeid IS NOT NULL )
AND book4_.lightsailready = 1
EDIT 2: Here's the performance analysis from ANTS Performance Profiler:
It is often database "good" practice to place lots of joins or super common joins into views. ORMs don't let you ignore these facts nor do they supplement the decades of time spent fine tuning databases to do these kinds of things efficiently. Refactor those joins into a singular view or a couple views if that'd make more sense in the greater perspective of your application.
NHibernate should be optimizing the query down and reducing the data so that .Net only has to mess with the important parts. However, if those domain objects are just naturally large, that's still a lot of data. Also, if it's a really large result set in terms of rows returned, that's a lot of objects getting instantiated even if the DB is able to return the set quickly. Refactoring this query into a view that only returns the data you actually need would also reduce object instantiation overhead.
Another thought would be to not do a .ToList()
. Return the enumerable and let your code lazily consume the data.
According to profiling information, the CreateQuery
takes 45% of the total execution time. However as you mentioned the query took 0ms when you executed directly. But this alone is not enough to say there is a performance problem because,
so ideal scenario is to measure how long it takes to execute entire code block, measure time for SQL query and calculate times, and if you do that you will probably end up with different values.
However, I'm not saying that the the NH Linq to SQL implementation is optimized for any query you come up with, but there are other ways in NHibernate to deal with those situations such as QueryOverAPI, CriteriaQueries, HQL and finally SQL.
Where is that red line crossed between a query that is performant and one that starts to struggle with materialization. What's going on under the hood?
This one is pretty hard question and without having detail knowledge of NHibernate Linq to SQL provider it's hard to provide a accurate answer. You can always try different mechanisms provided and see which one is the best for given scenario.
And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?
Yes, using a SP would help things to work pretty fast, but using SP would add more maintenance problems to your code base.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With