Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving bulk insert performance in Entity framework [duplicate]

People also ask

Why bulk insert is faster than insert?

In case of BULK INSERT, only extent allocations are logged instead of the actual data being inserted. This will provide much better performance than INSERT. The actual advantage, is to reduce the amount of data being logged in the transaction log.

How do I insert multiple rows in Entity Framework?

You can add multiple records or multiple objects using the AddRange method of DbSet as shown in the following code. The code creates a list of department objects and inserts two new departments to the list. We add the list to the context using the AddRange method.


There is opportunity for several improvements (if you are using DbContext):

Set:

yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

Do SaveChanges() in packages of 100 inserts... or you can try with packages of 1000 items and see the changes in performance.

Since during all this inserts, the context is the same and it is getting bigger, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext(); I think this is the big gain.

Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.

The actual numbers... could not be 100 or 1000 in your case... try it and tweak it.


There is no way to force EF to improve performance when doing it this way. The problem is that EF executes each insert in separate round trip to the database. Awesome isn't it? Even DataSets supported batch processing. Check this article for some workaround. Another workaround can be using custom stored procedure accepting table valued parameter but you need raw ADO.NET for that.


Using the code below you can extend the partial context class with a method that will take a collection of entity objects and bulk copy them to the database. Simply replace the name of the class from MyEntities to whatever your entity class is named and add it to your project, in the correct namespace. After that all you need to do is call the BulkInsertAll method handing over the entity objects you want to insert. Do not reuse the context class, instead create a new instance every time you use it. This is required, at least in some versions of EF, since the authentication data associated with the SQLConnection used here gets lost after having used the class once. I don't know why.

This version is for EF 5

public partial class MyEntities
{
    public void BulkInsertAll<T>(T[] entities) where T : class
    {
        var conn = (SqlConnection)Database.Connection;

        conn.Open();

        Type t = typeof(T);
        Set(t).ToString();
        var objectContext = ((IObjectContextAdapter)this).ObjectContext;
        var workspace = objectContext.MetadataWorkspace;
        var mappings = GetMappings(workspace, objectContext.DefaultContainerName, typeof(T).Name);

        var tableName = GetTableName<T>();
        var bulkCopy = new SqlBulkCopy(conn) { DestinationTableName = tableName };

        // Foreign key relations show up as virtual declared 
        // properties and we want to ignore these.
        var properties = t.GetProperties().Where(p => !p.GetGetMethod().IsVirtual).ToArray();
        var table = new DataTable();
        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;

            // Nullable properties need special treatment.
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            // Since we cannot trust the CLR type properties to be in the same order as
            // the table columns we use the SqlBulkCopy column mappings.
            table.Columns.Add(new DataColumn(property.Name, propertyType));
            var clrPropertyName = property.Name;
            var tableColumnName = mappings[property.Name];
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(clrPropertyName, tableColumnName));
        }

        // Add all our entities to our data table
        foreach (var entity in entities)
        {
            var e = entity;
            table.Rows.Add(properties.Select(property => GetPropertyValue(property.GetValue(e, null))).ToArray());
        }

        // send it to the server for bulk execution
        bulkCopy.BulkCopyTimeout = 5 * 60;
        bulkCopy.WriteToServer(table);

        conn.Close();
    }

    private string GetTableName<T>() where T : class
    {
        var dbSet = Set<T>();
        var sql = dbSet.ToString();
        var regex = new Regex(@"FROM (?<table>.*) AS");
        var match = regex.Match(sql);
        return match.Groups["table"].Value;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }

    private Dictionary<string, string> GetMappings(MetadataWorkspace workspace, string containerName, string entityName)
    {
        var mappings = new Dictionary<string, string>();
        var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
        dynamic entitySetMaps = storageMapping.GetType().InvokeMember(
            "EntitySetMaps",
            BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance,
            null, storageMapping, null);

        foreach (var entitySetMap in entitySetMaps)
        {
            var typeMappings = GetArrayList("TypeMappings", entitySetMap);
            dynamic typeMapping = typeMappings[0];
            dynamic types = GetArrayList("Types", typeMapping);

            if (types[0].Name == entityName)
            {
                var fragments = GetArrayList("MappingFragments", typeMapping);
                var fragment = fragments[0];
                var properties = GetArrayList("AllProperties", fragment);
                foreach (var property in properties)
                {
                    var edmProperty = GetProperty("EdmProperty", property);
                    var columnProperty = GetProperty("ColumnProperty", property);
                    mappings.Add(edmProperty.Name, columnProperty.Name);
                }
            }
        }

        return mappings;
    }

    private ArrayList GetArrayList(string property, object instance)
    {
        var type = instance.GetType();
        var objects = (IEnumerable)type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance, null, instance, null);
        var list = new ArrayList();
        foreach (var o in objects)
        {
            list.Add(o);
        }
        return list;
    }

    private dynamic GetProperty(string property, object instance)
    {
        var type = instance.GetType();
        return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance, null, instance, null);
    }
}

This version is for EF 6

public partial class CMLocalEntities
{
    public void BulkInsertAll<T>(T[] entities) where T : class
    {
        var conn = (SqlConnection)Database.Connection;

        conn.Open();

        Type t = typeof(T);
        Set(t).ToString();
        var objectContext = ((IObjectContextAdapter)this).ObjectContext;
        var workspace = objectContext.MetadataWorkspace;
        var mappings = GetMappings(workspace, objectContext.DefaultContainerName, typeof(T).Name);

        var tableName = GetTableName<T>();
        var bulkCopy = new SqlBulkCopy(conn) { DestinationTableName = tableName };

        // Foreign key relations show up as virtual declared 
        // properties and we want to ignore these.
        var properties = t.GetProperties().Where(p => !p.GetGetMethod().IsVirtual).ToArray();
        var table = new DataTable();
        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;

            // Nullable properties need special treatment.
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            // Since we cannot trust the CLR type properties to be in the same order as
            // the table columns we use the SqlBulkCopy column mappings.
            table.Columns.Add(new DataColumn(property.Name, propertyType));
            var clrPropertyName = property.Name;
            var tableColumnName = mappings[property.Name];
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(clrPropertyName, tableColumnName));
        }

        // Add all our entities to our data table
        foreach (var entity in entities)
        {
            var e = entity;
            table.Rows.Add(properties.Select(property => GetPropertyValue(property.GetValue(e, null))).ToArray());
        }

        // send it to the server for bulk execution
        bulkCopy.BulkCopyTimeout = 5*60;
        bulkCopy.WriteToServer(table);

        conn.Close();
    }

    private string GetTableName<T>() where T : class
    {
        var dbSet = Set<T>();
        var sql = dbSet.ToString();
        var regex = new Regex(@"FROM (?<table>.*) AS");
        var match = regex.Match(sql);
        return match.Groups["table"].Value;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }

    private Dictionary<string, string> GetMappings(MetadataWorkspace workspace, string containerName, string entityName)
    {
        var mappings = new Dictionary<string, string>();
        var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
        dynamic entitySetMaps = storageMapping.GetType().InvokeMember(
            "EntitySetMaps",
            BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,
            null, storageMapping, null);

        foreach (var entitySetMap in entitySetMaps)
        {
            var typeMappings = GetArrayList("EntityTypeMappings", entitySetMap);
            dynamic typeMapping = typeMappings[0];
            dynamic types = GetArrayList("Types", typeMapping);

            if (types[0].Name == entityName)
            {
                var fragments = GetArrayList("MappingFragments", typeMapping);
                var fragment = fragments[0];
                var properties = GetArrayList("AllProperties", fragment);
                foreach (var property in properties)
                {
                    var edmProperty = GetProperty("EdmProperty", property);
                    var columnProperty = GetProperty("ColumnProperty", property);
                    mappings.Add(edmProperty.Name, columnProperty.Name);
                }
            }
        }

        return mappings;
    }

    private ArrayList GetArrayList(string property, object instance)
    {
        var type = instance.GetType();
        var objects = (IEnumerable)type.InvokeMember(
            property, 
            BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance, null, instance, null);
        var list = new ArrayList();
        foreach (var o in objects)
        {
            list.Add(o);
        }
        return list;
    }

    private dynamic GetProperty(string property, object instance)
    {
        var type = instance.GetType();
        return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance, null, instance, null);
    }

}

And finally, a little something for you Linq-To-Sql lovers.

partial class MyDataContext
{
    partial void OnCreated()
    {
        CommandTimeout = 5 * 60;
    }

    public void BulkInsertAll<T>(IEnumerable<T> entities)
    {
        entities = entities.ToArray();

        string cs = Connection.ConnectionString;
        var conn = new SqlConnection(cs);
        conn.Open();

        Type t = typeof(T);

        var tableAttribute = (TableAttribute)t.GetCustomAttributes(
            typeof(TableAttribute), false).Single();
        var bulkCopy = new SqlBulkCopy(conn) { 
            DestinationTableName = tableAttribute.Name };

        var properties = t.GetProperties().Where(EventTypeFilter).ToArray();
        var table = new DataTable();

        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            table.Columns.Add(new DataColumn(property.Name, propertyType));
        }

        foreach (var entity in entities)
        {
            table.Rows.Add(properties.Select(
              property => GetPropertyValue(
              property.GetValue(entity, null))).ToArray());
        }

        bulkCopy.WriteToServer(table);
        conn.Close();
    }

    private bool EventTypeFilter(System.Reflection.PropertyInfo p)
    {
        var attribute = Attribute.GetCustomAttribute(p, 
            typeof (AssociationAttribute)) as AssociationAttribute;

        if (attribute == null) return true;
        if (attribute.IsForeignKey == false) return true; 

        return false;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }
}

Maybe this answer here will help you. Seems that you want to dispose of the context periodically. This is because the context gets bigger and bigger as the attached entities grows.


There are two major performance issues with your code:

  • Using Add method
  • Using SaveChanges

Using Add method

The Add method becomes only slower and slower at each entity you add.

See: http://entityframework.net/improve-ef-add-performance

For example, adding 10,000 entities via:

  • Add (take ~105,000ms)
  • AddRange (take ~120ms)

Note: Entities has not been saved yet in the database!

The problem is that the Add method tries to DetectChanges at every entity added while AddRange does it once after all entities have been added to the context.

Common solutions are:

  • Use AddRange over Add
  • SET AutoDetectChanges to false
  • SPLIT SaveChanges in multiple batches

Using SaveChanges

Entity Framework has not been created for Bulk Operations. For every entity you save, a database round-trip is performed.

So, if you want to insert 20,000 records, you will perform 20,000 database round-trip which is INSANE!

There are some third-party libraries supporting Bulk Insert available:

  • Z.EntityFramework.Extensions (Recommended)
  • EFUtilities
  • EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions support all kind of associations and inheritance, and it's the only one still supported.


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

EDIT: Answer Question in Comment

Is there a recommend max size for each bulk insert for the library you created

Not too high, not too low. There isn't a particular value that fit in all scenarios since it depends on multiple factors such as row size, index, trigger, etc.

It's normally recommended to be around 4000.

Also is there a way to tie it all in one transaction and not worry about it timing out

You can use Entity Framework transaction. Our library uses the transaction if one is started. But be careful, a transaction that takes too much time come also with problems such as some row/index/table lock.


Currently there is no better way, however there may be a marginal improvement by moving SaveChanges inside for loop for probably 10 items.

int i = 0;

foreach (Employees item in sequence)
{
   t = new Employees ();
   t.Text = item.Text;
   dataContext.Employees.AddObject(t);   

   // this will add max 10 items together
   if((i % 10) == 0){
       dataContext.SaveChanges();
       // show some progress to user based on
       // value of i
   }
   i++;
}
dataContext.SaveChanges();

You can adjust 10 to be closer to better performance. It will not greatly improve speed but it will allow you to show some progress to user and make it more user friendly.


In Azure environment with Basic website that has 1 Instance.I tried to insert a Batch of 1000 records at a time out of 25000 records using for loop it took 11.5 min but in parallel execution it took less than a minute.So I recommend using TPL(Task Parallel Library).

         var count = (you collection / 1000) + 1;
         Parallel.For(0, count, x =>
        {
            ApplicationDbContext db1 = new ApplicationDbContext();
            db1.Configuration.AutoDetectChangesEnabled = false;

            var records = members.Skip(x * 1000).Take(1000).ToList();
            db1.Members.AddRange(records).AsParallel();

            db1.SaveChanges();
            db1.Dispose();
        });