Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CodeFirst loading 1 parent linked to 25 000 children is slow

I searched a lot on my performance problem and tried all sorts of different things, but I just can't seem to get it to work fast enough. Here's my problem to it's simplest form:

I'm using entity framework 5 and I want to be able to lazy load child instances of a parent when the user selects that parent, so I don't have to pull the entire database. However I have been having performance problems with lazy loading the children. I think the problem is the wire up of the navigation properties between the Parent and the children. I'm also thinking it must be something I did wrong because I believe this is a simple case.

So I put up a program to test a single lazy load to isolate the problem.

Here's the Test:

I created a POCO Parent class and a Child POCO Class. Parent has n Children and Child has 1 Parent. There's only 1 parent in the SQL Server database and 25 000 children for that single parent. I tried different methods to load this data. Whenever I load either the children and the parent in the same DbContext, it takes a really long time. But if I load them in different DbContexts, it loads really fast. However, I want those instances to be in the same DbContext.

Here is my test setup and everything you need to replicate it:

POCOs:

public class Parent
{
    public int ParentId { get; set; }

    public string Name { get; set; }

    public virtual List<Child> Childs { get; set; }
}

public class Child
{
    public int ChildId { get; set; }

    public int ParentId { get; set; }

    public string Name { get; set; }

    public virtual Parent Parent { get; set; }
}

DbContext:

public class Entities : DbContext
{
    public DbSet<Parent> Parents { get; set; }

    public DbSet<Child> Childs { get; set; }
}

TSQL Script to create the Database and data:

USE [master]
GO

IF EXISTS(SELECT name FROM sys.databases
    WHERE name = 'PerformanceParentChild')
    alter database [PerformanceParentChild] set single_user with rollback immediate
    DROP DATABASE [PerformanceParentChild]
GO

CREATE DATABASE [PerformanceParentChild]
GO
USE [PerformanceParentChild]
GO
BEGIN TRAN T1;
SET NOCOUNT ON

CREATE TABLE [dbo].[Parents]
(
    [ParentId] [int] CONSTRAINT PK_Parents PRIMARY KEY,
    [Name] [nvarchar](200) NULL
)
GO

CREATE TABLE [dbo].[Children]
(
    [ChildId] [int] CONSTRAINT PK_Children PRIMARY KEY,
    [ParentId] [int] NOT NULL,
    [Name] [nvarchar](200) NULL
)
GO

INSERT INTO Parents (ParentId, Name)
VALUES (1, 'Parent')

DECLARE @nbChildren int;
DECLARE @childId int;

SET @nbChildren = 25000;
SET @childId = 0;

WHILE @childId < @nbChildren
BEGIN
   SET @childId = @childId + 1;
   INSERT INTO [dbo].[Children] (ChildId, ParentId, Name)
   VALUES (@childId, 1, 'Child #' + convert(nvarchar(5), @childId))
END

CREATE NONCLUSTERED INDEX [IX_ParentId] ON [dbo].[Children] 
(
    [ParentId] ASC
)
GO

ALTER TABLE [dbo].[Children] ADD CONSTRAINT [FK_Children.Parents_ParentId] FOREIGN KEY([ParentId])
REFERENCES [dbo].[Parents] ([ParentId])
GO

COMMIT TRAN T1;

App.config containing the connection string:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <connectionStrings>
    <add
      name="Entities"
      providerName="System.Data.SqlClient"
      connectionString="Server=localhost;Database=PerformanceParentChild;Trusted_Connection=true;"/>
  </connectionStrings>
</configuration>

Test console class:

class Program
{
    static void Main(string[] args)
    {
        List<Parent> parents;
        List<Child> children;

        Entities entities;
        DateTime before;
        TimeSpan childrenLoadElapsed;
        TimeSpan parentLoadElapsed;

        using (entities = new Entities())
        {
            before = DateTime.Now;
            parents = entities.Parents.ToList();
            parentLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load only the parent from DbSet:" + parentLoadElapsed.TotalSeconds + " seconds");
        }

        using (entities = new Entities())
        {
            before = DateTime.Now;
            children = entities.Childs.ToList();
            childrenLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load only the children from DbSet:" + childrenLoadElapsed.TotalSeconds + " seconds");
        }

        using (entities = new Entities())
        {
            before = DateTime.Now;
            parents = entities.Parents.ToList();
            parentLoadElapsed = DateTime.Now - before;

            before = DateTime.Now;
            children = entities.Childs.ToList();
            childrenLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load the parent from DbSet:" + parentLoadElapsed.TotalSeconds + " seconds" +
                                               ", then load the children from DbSet:" + childrenLoadElapsed.TotalSeconds + " seconds");
        }

        using (entities = new Entities())
        {
            before = DateTime.Now;
            children = entities.Childs.ToList();
            childrenLoadElapsed = DateTime.Now - before;

            before = DateTime.Now;
            parents = entities.Parents.ToList();
            parentLoadElapsed = DateTime.Now - before;


            System.Diagnostics.Debug.WriteLine("Load the children from DbSet:" + childrenLoadElapsed.TotalSeconds + " seconds" +
                                               ", then load the parent from DbSet:" + parentLoadElapsed.TotalSeconds + " seconds");
        }

        using (entities = new Entities())
        {
            before = DateTime.Now;
            parents = entities.Parents.ToList();
            parentLoadElapsed = DateTime.Now - before;

            before = DateTime.Now;
            children = parents[0].Childs;
            childrenLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load the parent from DbSet:" + parentLoadElapsed.TotalSeconds + " seconds" +
                                               ", then load the children from Parent's lazy loaded navigation property:" + childrenLoadElapsed.TotalSeconds + " seconds");
        }

        using (entities = new Entities())
        {
            before = DateTime.Now;
            parents = entities.Parents.Include(p => p.Childs).ToList();
            parentLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load the parent from DbSet and children from include:" + parentLoadElapsed.TotalSeconds + " seconds");

        }

        using (entities = new Entities())
        {
            entities.Configuration.ProxyCreationEnabled = false;
            entities.Configuration.AutoDetectChangesEnabled = false;
            entities.Configuration.LazyLoadingEnabled = false;
            entities.Configuration.ValidateOnSaveEnabled = false;

            before = DateTime.Now;
            parents = entities.Parents.Include(p => p.Childs).ToList();
            parentLoadElapsed = DateTime.Now - before;
            System.Diagnostics.Debug.WriteLine("Load the parent from DbSet and children from include:" + parentLoadElapsed.TotalSeconds + " seconds with everything turned off");

        }

    }
}

Here are the results of those test:

Load only the parent from DbSet:0,972 seconds

Load only the children from DbSet:0,714 seconds

Load the parent from DbSet:0,001 seconds, then load the children from DbSet:8,6026 seconds

Load the children from DbSet:0,6864 seconds, then load the parent from DbSet:7,5816159 seconds

Load the parent from DbSet:0 seconds, then load the children from Parent's lazy loaded navigation property:8,5644549 seconds

Load the parent from DbSet and children from include:8,6428788 seconds

Load the parent from DbSet and children from include:9,1416586 seconds with everything turned off

Analysis

Whenever the parent and the children are in the same DbContext, it takes a long time (9 seconds) to wire everything up. I even tried turning off everything from proxy creation to lazy loading, but to no avail. Can someone please help me ?

like image 255
CurlyFire Avatar asked Oct 15 '12 15:10

CurlyFire


2 Answers

I answered similar question previously. My previous answer contains theory answering this issue but with your detailed question I can directly point where the issue is. First lets run one of the problematic cases with performance profiler. This is result from DotTrace when using tracing mode:

enter image description here

Fixing relations runs in loop. It means that for 25.000 records you have 25.000 iterations but each of these iterations internally calls CheckIfNavigationPropertyContainsEntity on EntityCollection:

internal override bool CheckIfNavigationPropertyContainsEntity(IEntityWrapper wrapper)
{
    if (base.TargetAccessor.HasProperty)
    {
        object navigationPropertyValue = base.WrappedOwner.GetNavigationPropertyValue(this);
        if (navigationPropertyValue != null)
        {
            if (!(navigationPropertyValue is IEnumerable))
            {
                throw new EntityException(Strings.ObjectStateEntry_UnableToEnumerateCollection(base.TargetAccessor.PropertyName, base.WrappedOwner.Entity.GetType().FullName));
            }
            foreach (object obj3 in navigationPropertyValue as IEnumerable)
            {
                if (object.Equals(obj3, wrapper.Entity))
                {
                    return true;
                }
            }
        }
    }
    return false;
}

Number of iterations of inner loop grows as items are added to the navigation property. The math is in my previous answer - it is arithmetic series where the total number of iterations of inner loop is 1/2 * (n^2 - n) => n^2 complexity. The inner loop inside the outer loop results in 312.487.500 iterations in your case as also the performance tracing shows.

I created work item on EF CodePlex for this issue.

like image 177
Ladislav Mrnka Avatar answered Nov 08 '22 23:11

Ladislav Mrnka


This is not answer as I don't have a solution to improve the performance, but the comment section doesn't have enough space for the following. I just want to add a few additional tests and observations.

First, I could reproduce your measured times almost exactly for all seven tests. I have used EF 4.1 for the test.

Some interesting things to note:

  • From (the fast) test 2 I would conclude that object materialization (converting the rows and columns returned from the database server into objects) isn't slow.

  • This is also confirmed by loading the entities in test 3 without change tracking:

    parents = entities.Parents.AsNoTracking().ToList();
    // ...
    children = entities.Childs.AsNoTracking().ToList();
    

    This code runs fast although 25001 objects have to be materialized as well (but no relationships between the navigation properties will be established!).

  • Also from (the fast) test 2 I would conclude that creating entity snapshots for change tracking isn't slow.

  • In tests 3 and 4 the relationships between the parent and the 25000 children get fixed up when the entities are loaded from the database, i.e. EF adds all the Child entities to the parent's Childs collection and sets the Parent in each child to the loaded parent. Apparently this step is slow, as you already guessed:

    I think the problem is the wire up of the navigation properties between the Parent and the children.

    Especially the collection side of the relationship seems to be the problem: If you comment out the Childs navigation property in the Parent class (the relationship is still a required one-to-many relationship then) tests 3 and 4 are fast, although EF still sets the Parent property for all 25000 Child entities.

    I don't know why filling up the navigation collection during relationship fixup is so slow. If you simulate it manually in a naive manner, like so...

    entities.Configuration.ProxyCreationEnabled = false;
    
    children = entities.Childs.AsNoTracking().ToList();
    parents = entities.Parents.AsNoTracking().ToList();
    
    parents[0].Childs = new List<Child>();
    foreach (var c in children)
    {
        if (c.ParentId == parents[0].ParentId)
        {
            c.Parent = parents[0];
            parents[0].Childs.Add(c);
        }
    }
    

    ... it goes fast. Obviously relationship fixup internally doesn't work this simple way. Maybe it needs to be checked if the collection already contains the child to be tested:

    foreach (var c in children)
    {
        if (c.ParentId == parents[0].ParentId)
        {
            c.Parent = parents[0];
            if (!parents[0].Childs.Contains(c))
                parents[0].Childs.Add(c);
        }
    }
    

    This is significantly slower (around 4 seconds).

Anyway, relationship fixup seems to be the performance botteneck. I don't know how to improve it if you need change tracking and correct relations between your attached entities.

like image 34
Slauma Avatar answered Nov 09 '22 00:11

Slauma