Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework, Bulk Inserts, and Maintaining Relationships

I have what would seem to be a common problem yet I cannot figure out how to achieve the desired outcome. I have a nested entity with navigation properties defined on it as seen in the following diagram.

enter image description here

The map points collection can potentially be quite large for a given MapLine and there can be quite a large number of MapLines for a MapLayer.

The question here is what is the best approach for getting a MapLayer object inserted into the database using Entity Framework and still maintain the relationships that are defined by the navigation properties?

A standard Entity Framework implementation

dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();

causes a large memory spike and pretty poor return times.

I have tried implementing the EntityFramework.BulkInsert package but it does not honor the relationships of the objects.

This seems like it would be a problem that someone has run into before but I cant seem to find any resources that explain how to accomplish this task.

Update

I have tried to implement the suggestion provided by Richard but I am not understanding how I would go about this for a nested entity such as the one I have described. I am running under the assumption that I need to insert the MapLayer object, then the MapLines, then the MapPoints to honor the PF/FK relationship in the database. I am currently trying the following code but this does not appear to be correct.

dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();

List<MapLine> mapLines = new List<MapLine>();
List<MapPoint> mapPoints = new List<MapPoint>();
foreach (MapLine mapLine in mapLayer.MapLines)
{
    //Update the mapPoints.MapLine properties to reflect the current line object
    var updatedLines = mapLine.MapPoints.Select(x => { x.MapLine = mapLine; return x; }).ToList();

    mapLines.AddRange(updatedLines);
}

using (TransactionScope scope = new TransactionScope())
{
    MyDbContext context = null;
    try
    {
        context = new MyDbContext();
        context.Configuration.AutoDetectChangesEnabled = false;

        int count = 0;
        foreach (var entityToInsert in mapLines)
        {
            ++count;
            context = AddToContext(context, entityToInsert, count, 100, true);
        }

        context.SaveChanges();
    }
    finally
    {
        if (context != null)
            context.Dispose();
    }

    scope.Complete();
}

Update 2

After having tried multiple different ways to achieve this I finally gave up and just inserted the MapLayer as an entity and stored the MapLines => MapPoints relationship as the raw Json string in a byte array on the MapLayer entity (as I am not querying against those structures this works for me).

As the saying goes "It aint pretty, but it works".

I did have some success with the BulkInsert package and managing the relationships outside of EF, but again ran into a memory problem when trying to use EF to pull the data back into the system. It seems that currently, EF is not capable of handling large datasets and complex relationships efficiently.

like image 700
jamesamuir Avatar asked Jan 12 '15 20:01

jamesamuir


People also ask

What is bulk insert in Entity Framework?

The EF Bulk Insert feature let you insert thousands of entities in your database efficiently. This feature is provided by the library EF Extensions (Included with EF Classic). EF Extensions is used by over 2000 customers all over the world and supports all Entity Framework versions (EF4, EF5, EF6, EF Core, EF Classic).

How do I insert multiple rows in EF core?

You can add multiple records or multiple objects using the AddRange method of DbSet as shown in the following code. The code creates a list of department objects and inserts two new departments to the list. We add the list to the context using the AddRange method.


2 Answers

I had bad experience with huge context save. All those recommendations about saving in iterations by 100 rows, by 1000 rows, then disposing context or clearing list and detaching objects, assigning null to everything etc etc - it is all bullshit. We had requirements to insert daily millions of rows in many tables. Definitely one should not use entity in these conditions. You will be fighting with memory leaks and decrease in insertion speed when iterations proceed.

Our first improvement was creating stored procedures and adding them to model. It is 100 times faster then Context.SaveChanges(), and there is no leaks, no decrease in speed over time.

But it was not sufficient for us and we decided to use SqlBulkCopy. It is super fast. 1000 times faster then using stored procedures.

So my suggestion will be: if you have many rows to insert but count is under something like 50000 rows, use stored procedures, imported in model; if you have hundreds of thousands of rows, go and try SqlBulkCopy.

Here is some code:

EntityConnection ec = (EntityConnection)Context.Connection;
SqlConnection sc = (SqlConnection)ec.StoreConnection;

var copy = new SqlBulkCopy(sc, SqlBulkCopyOptions.CheckConstraints | SqlBulkCopyOptions.Default , null);

copy.DestinationTableName = "TableName";
copy.ColumnMappings.Add("SourceColumn", "DBColumn");
copy.WriteToServer(dataTable);
copy.Close();

If you use DbTransaction with context, you can manage to bulk insert using that transaction as well, but it needs some hacks.

like image 186
Giorgi Nakeuri Avatar answered Oct 29 '22 00:10

Giorgi Nakeuri


Bulk Insert is not the only way of efficiently adding data using Entity Framework - a number of alternatives are detailed in this answer. You can use the optimisations suggested there (disabling change tracking) then you can just add things as normal.

Note that as you are adding many items at once, you'll need to recreate your context fairly frequently to stop the memory leak and slowdown that you'll get.

like image 42
Richard Avatar answered Oct 28 '22 22:10

Richard