Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strategy to optimize this large SQL insert via C#?

I have about 1.5 million files I need to insert records for in the database. Each record is inserted with a key that includes the name of the file.

The catch: The files are not uniquely identified currently.

So, what we'd like to do is, for each file:

  • Insert a record. One of the fields in the record should include an amazon S3 key which should include the ID of the newly inserted record.
  • Rename the file to include the ID so that it matches the format of the key.

The best thing I can think to do is:

  • Run an individual insert command that returns the ID of the added row.
  • Add that back as a property to the individual business object I'm looping through.
  • Generate an update statement that updates the S3 key to include the ID
  • Output the file, concatenate the ID into the end the file name.

As I can tell, that looks to be :

  • 1.5 million insert statements
    • individual SqlCommand executions and read because we need the ID back),
  • 1.5 million times setting a property on an object.
  • 1.5 million update statements generated and executed
    • Perhaps could make this a one giant concatenated update statement to do them all at once; not sure if that helps
  • 1.5 million file copies.

I can't get around the actual file part, but for the rest, is there a better strategy I'm not seeing?

like image 808
SeanKilleen Avatar asked Sep 18 '13 15:09

SeanKilleen


People also ask

How do I optimize a SQL insert query?

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

How can I speed up bulk insert in SQL?

Below are some good ways to improve BULK INSERT operations : Using TABLOCK as query hint. Dropping Indexes during Bulk Load operation and then once it is completed then recreating them. Changing the Recovery model of database to be BULK_LOGGED during the load operation.


1 Answers

If you make the client application generate the IDs you can use a straight-forward SqlBulkCopy to insert all rows at once. It will be done in seconds.

If you want to keep the IDENTITY property of the column, you can run a DBCC CHECKIDENT(RESEED) to advance the identity counter by 1.5m to give you a guaranteed gap that you can insert into. If the number of rows is not statically known you can perform the inserting in smaller chunks of maybe 100k until you are done.

like image 140
usr Avatar answered Sep 28 '22 18:09

usr