I have about 1.5 million files I need to insert records for in the database. Each record is inserted with a key that includes the name of the file.
The catch: The files are not uniquely identified currently.
So, what we'd like to do is, for each file:
The best thing I can think to do is:
As I can tell, that looks to be :
I can't get around the actual file part, but for the rest, is there a better strategy I'm not seeing?
To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.
Below are some good ways to improve BULK INSERT operations : Using TABLOCK as query hint. Dropping Indexes during Bulk Load operation and then once it is completed then recreating them. Changing the Recovery model of database to be BULK_LOGGED during the load operation.
If you make the client application generate the IDs you can use a straight-forward SqlBulkCopy
to insert all rows at once. It will be done in seconds.
If you want to keep the IDENTITY
property of the column, you can run a DBCC CHECKIDENT(RESEED)
to advance the identity counter by 1.5m to give you a guaranteed gap that you can insert into. If the number of rows is not statically known you can perform the inserting in smaller chunks of maybe 100k until you are done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With