Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserting many rows in EF too slow, how to get primary key from SqlBulkCopy?

We have a use case in our application where a user triggers a request that will cause an insert of 100 to 1000 rows.
After that insert, we need the object to continue processing and create more objects which are foreign keys to the originally inserted objects, or in other words, we need the Primary Key ID of the inserted objects.

So far we have used EF to do that in a foreach loop, this was too slow, and was taking about 15-20 seconds to complete for about 600 rows. (while blocking the user, bad :( )

Original code (also handles updates, but we do not care about performance there, it is not blocking users):

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update 
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location <============ This is the bottleneck, takes about 20-40ms per row
        db.Locations.Add(updatedLoc);
    }
}

// This actually takes about 3 seconds for 600 rows, was actually acceptable
db.SaveChanges();

So after researching on SO and internet, I found out I was using EF the wrong way and need to use SqlBulkCopy

And thus the code was rewritten, and what used to take ~20 seconds, now takes ~100ms (!)

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location
        dataTable.Rows.Add(new object[] { \\the 14 fields of the location.. });
    }
}

System.Diagnostics.Trace.WriteLine("preparing to bulk insert", "loadSimple");

// perform the bulk insert
using (var bulkCopy = new System.Data.SqlClient.SqlBulkCopy(System.Configuration.ConfigurationManager.ConnectionStrings["bulk-inserter"].ConnectionString))
{
    bulkCopy.DestinationTableName = "Locations";

    for (int i = 0; i < dataTable.Columns.Count; i++)
    {
        bulkCopy.ColumnMappings.Add(i, i + 1);
    }

    bulkCopy.WriteToServer(dataTable);
}

// for update
db.SaveChanges();

The problem is, after the bulk copy, the objects in the Locations collection, which is part of the EF ORM are not changed (that is okay and expected), but I need the inserted ids to continue work on these objects.

A simple solution would be to immediately select the data again from the database, I have a the data at hand, I can simply re-select it into a different collection.

But that solution feels incorrect, is there no way to get the ids as part of the insert.

EDIT: the simple solution works, see accepted answer below on how to easily sync it back to EF.

Maybe I should not use SqlBulkCopy (i expect up to about 1000 rows, no more) and use something else?

Please note, A few related SO questions and solutions, all seem to move away from EF..

  1. Possible to get PrimayKey IDs back after a SQL BulkCopy?
  2. Improving bulk insert performance in Entity framework
  3. Fastest Way of Inserting in Entity Framework (this is about SaveChanges() performance with many pending inserts, should call it every X inserts and not at the end of the processing with 1000s pending)
like image 627
Jonathan Levison Avatar asked Feb 20 '23 15:02

Jonathan Levison


1 Answers

Nothing you do through the EF will ever be as fast as SqlBulkCopy. Indeed, raw SQL INSERTs aren't as fast. So you just need to re-read Locations. Refresh the query by re-reading with MergeOption.OverwriteChanges.

like image 183
Craig Stuntz Avatar answered Apr 06 '23 19:04

Craig Stuntz