Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very poor performance for batch insert with SQL Server CE 4.0 and Entity Framework 4.2

I'm inserting a lot of data into SQL Server CE 4.0 using Entity Framework 4.2 (code-first), and the performance is abysmal when compared to direct SQL insertion.

The model is very simple:

public class DocMember
{
    public DocMember() { this.Items = new List<DocItem>(); }

    public int Id { get; set; }

    public string Name { get; set; }
    public string MemberType { get; set; }
    public string AssemblyName { get; set; }

    public virtual IList<DocItem> Items { get; set; }
}

public class DocItem
{
    public int Id { get; set; }
    public DocMember Member { get; set; }
    public string PartType { get; set; }
    public string PartName { get; set; }
    public string Text { get; set; }
}

I have 2623 DocMembers and a total of of 7747 DocItems to insert, and I'm getting the following execution times:

With SQL: 00:00:02.8
With EF:  00:03:02.2

I can understand there's a bit of overhead with EF, but it is 65 times slower than SQL!

Perhaps there's a problem in my code, but it is quite straightforward and I can't see what could be wrong:

    private TimeSpan ImportMembersEF(IList<DocMember> members)
    {
        using (var db = new DocEntities())
        {
            db.Database.CreateIfNotExists();

            var sw = Stopwatch.StartNew();
            foreach (var m in members)
            {
                db.Members.Add(m);
            }

            db.SaveChanges();
            sw.Stop();
            return sw.Elapsed;
        }
    }

I also tried to call SaveChanges for each inserted item, or every 100 or 200 items, to no avail (it actually makes it worse).

Is there a way to improve the performance, or do I have to use SQL for batch inserts?


EDIT: for completeness, here's the code for the SQL insertion: http://pastebin.com/aeaC1KcB

like image 425
Thomas Levesque Avatar asked Jan 23 '12 23:01

Thomas Levesque


2 Answers

You can use my SqlCeBulkCopy library for loading bulk data, it mimics the SqlBulkCopy api: http://sqlcebulkcopy.codeplex.com

like image 166
ErikEJ Avatar answered Sep 19 '22 10:09

ErikEJ


It's slow because it's not batching the insert.

When you insert using identity on the db, it has to select the result ID out after each item to assign to the model. This makes it really slow.

Your adhoc SQL isn't selecting the ID out so when batched you can submit all the statements at once.

Altho written with NHibernate:

http://www.philliphaydon.com/2011/09/the-benefits-of-letting-the-orm-generate-the-identity-part-1/

I wrote about using ORM generated IDs vs SQL Generated IDs.

like image 44
Phill Avatar answered Sep 23 '22 10:09

Phill