Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#, EF & LINQ : slow at inserting large (7Mb) records into SQL Server

There's a long version of this question, and a short version.

The short version:

why are both LINQ and EF so slow at inserting a single, large (7 Mb) record into a remote SQL Server database ?

And here's the long version (with some information about workarounds, which might be useful to other readers):

All of the following example code does run okay, but as my users are in Europe and our Data Centers are based in America, it is damned slow. But if I run the same code on a Virtual PC in America, it runs instantly. (And no, sadly my company wants to keep all data in-house, so I can't use Azure, Amazon Cloud Services, etc)

Quite a few of my corporate apps involve reading/writing data from Excel into SQL Server, and often, we'll want to save a raw-copy of the Excel file in a SQL Server table.

This is very straightforward to do, simply reading in the raw data from a local file, and saving it into a record.

private int SaveFileToSQLServer(string filename)
{
    //  Read in an Excel file, and store it in a SQL Server [External_File] record.
    //
    //  Returns the ID of the [External_File] record which was added.
    //

    DateTime lastModifed = System.IO.File.GetLastWriteTime(filename);
    byte[] fileData = File.ReadAllBytes(filename);

    //  Create a new SQL Server database record, containing our file's raw data 
    //  (Note: the table has an IDENTITY Primary-Key, so will generate a ExtFile_ID for us.)
    External_File newFile = new External_File()
    {
        ExtFile_Filename = System.IO.Path.GetFileName(filename),
        ExtFile_Data = fileData,
        ExtFile_Last_Modified = lastModifed,
        Update_By = "mike",
        Update_Time = DateTime.UtcNow
    };
    dc.External_Files.InsertOnSubmit(newFile);
    dc.SubmitChanges(); 

    return newFile.ExtFile_ID;
}

Yup, no surprises there, and it works fine.

But, what I noticed is that for large Excel files (7-8Mb), this code to insert one (large!) record would take 40-50 seconds to run. I put this in a background thread, and it all worked fine, but, of course, if the user quit my application, this process would get killed off, which would cause problems.

As a test, I tried to replace this function with code to do this:

  • copy the file into a shared directory on the SQL Server machine
  • called a stored procedure to read the raw data (blob) into the same table

Using this method, the entire process would take just 3-4 seconds.

If you're interested, here's the Stored Procedure I used to upload a file (which MUST be stored in a folder on the SQL Server machine itself) into a database record:

CREATE PROCEDURE [dbo].[UploadFileToDatabase]
    @LocalFilename nvarchar(400)
AS
BEGIN
    --  By far, the quickest way to do this is to copy the file onto the SQL Server machine, then call this stored
    --  procedure to read the raw data into a [External_File] record, and link it to the Pricing Account record.
    --
    --      EXEC [dbo].[UploadPricingToolFile] 'D:\ImportData\SomeExcelFile.xlsm'
    -- 
    --  Returns: -1 if something went wrong  (eg file didn't exist) or the ID of our new [External_File] record
    --
    --  Note that the INSERT will go wrong, if the user doesn't have "bulkadmin" rights.
    --      "You do not have permission to use the bulk load statement."
    --  EXEC master..sp_addsrvrolemember @loginame = N'GPP_SRV', @rolename = N'bulkadmin'
    --
    SET NOCOUNT ON;

    DECLARE 
        @filename nvarchar(300),        --  eg "SomeFilename.xlsx"  (without the path)
        @SQL nvarchar(2000),
        @New_ExtFile_ID int

    --  Extract (just) the filename from our Path+Filename parameter
    SET @filename = RIGHT(@LocalFilename,charindex('\',reverse(@LocalFilename))-1)

    SET @SQL = 'INSERT INTO [External_File]  ([ExtFile_Filename], [ExtFile_Data]) '
    SET @SQL = @SQL + 'SELECT ''' + @Filename + ''', * 
    SET @SQL = @SQL + ' FROM OPENROWSET(BULK ''' + @LocalFilename +''', SINGLE_BLOB) rs'

    PRINT convert(nvarchar, GetDate(), 108) + ' Running: ' + @SQL
    BEGIN TRY
        EXEC (@SQL)
        SELECT @New_ExtFile_ID = @@IDENTITY
    END TRY
    BEGIN CATCH
        PRINT convert(nvarchar, GetDate(), 108) + ' An exception occurred.'
        SELECT -1
        RETURN
    END CATCH

    PRINT convert(nvarchar, GetDate(), 108) + ' Finished.'

    --  Return the ID of our new [External_File] record
    SELECT @New_ExtFile_ID
END

The key to this code is that it builds up a SQL command like this:

INSERT INTO [External_File]  ([ExtFile_Filename], [ExtFile_Data])
SELECT 'SomeFilename.xlsm', * FROM OPENROWSET(BULK N'D:\ImportData\SomeExcelFile.xlsm', SINGLE_BLOB) rs

.. and, as both the database and file to be uploaded are both on the same machine, this runs almost instantly.

As I said, overall, it took 3-4 seconds to copy the file to a folder on the SQL Server machine, and run this stored procedure, compared to 40-50 seconds to do the same using C# code with LINQ or EF.

Exporting blob data from SQL Server into an external file

And, of course, the same is true in the opposite direction.

First, I wrote some C#/LINQ code to load the one (7Mb !) database record and write its binary data into a raw-file. This took about 30-40 seconds to run.

But if I exported the SQL Server data to a file (saved on the SQL Server machine) first..

EXEC master..xp_cmdshell 'BCP "select ef.ExtFile_Data FROM [External_File] ef where ExtFile_ID = 585" queryout "D:\ImportData\SomeExcelFile.xslx" -T -N'

...and then copied the file from the SQL Server folder to the user's folder, then once again, it ran in a couple of seconds.

And this is my question: Why are both LINQ and EF so bad at inserting a single large record into the database ?

I assume the latency (distance between us, here in Europe, and our Data Centers in the States) are a major cause of the delay, but it's just odd that a bog-standard file-copy can be so much faster.

Am I missing something ?

Obviously, I've found walkarounds to these problems, but they involve added some extra permissions to our SQL Server machines and shared folders on SQL Server machines, and our DBAs really don't like granting rights for things like "xp_cmdshell"...

A few months later...

I had the same issue again this week, and tried Kevin H's suggestion to use Bulk-Insert to insert a large (6Mb) record into SQL Server.

Using bulk-insert, it took around 90 seconds to insert the 6Mb record, even though our data centre is 6,000 miles away.

So, the moral of the story: when inserting very-large database records, avoid using a regular SubmitChanges() command, and stick to using bulk-insert.

like image 582
Mike Gledhill Avatar asked Jul 10 '15 14:07

Mike Gledhill


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.


1 Answers

You could try using profiler to see what Entity Framework is doing with the insert. For example, if it's selecting data out of your table, it could be taking a long time to return the data over the wire, and you may not notice that locally.

I have found that the best way to load a large amount of data (both record count and record size) into sql server from c# is to use the SqlBulkCopy class. Even though you are inserting only 1 record, you may still benefit from this change.

To use bulk copy, just create a datatable that matches the structure of your table. Then call the code like this.

using (SqlConnection destinationConnection = new    SqlConnection(connectionString))
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnection))
{
    bulkCopy.DestinationTableName = "External_File";
    bulkCopy.WriteToServer(dataTable);
}
like image 141
Kevin Harker Avatar answered Oct 02 '22 14:10

Kevin Harker