Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for inserting/updating large amount of data in SQL Server 2008

I'm building a system for updating large amounts of data through various CSV feeds. Normally I would just loop though each row in the feed, do a select query to check if the item already exists and insert/update an item depending if it exists or not.

I feel this method isn't very scalable and could hammer the server on larger feeds. My solution is to loop through the items as normal but store them in memory. Then for every 100 or so items do a select on those 100 items and get a list of existing items in the database that match. Then concatenate the insert/update statements together and run them into the database. This would essentially cut down on the trips to the database.

Is this a scalable enough solution and are there any example tutorials on importing large feeds into a productive environment?

Thanks

like image 428
Mark Clancy Avatar asked Feb 26 '10 13:02

Mark Clancy


People also ask

What are the good practices while inserting huge data into a table?

Bulk-insert best practices To bulk-insert data into an existing table, batch multiple rows in one multi-row INSERT statement. Experimentally determine the optimal batch size for your application by monitoring the performance for different batch sizes (10 rows, 100 rows, 1000 rows).

How do you handle a large amount of data in SQL?

The most recommended and best option is to have a STANDBY server, restore the backup of the production database on that server, and then run the DBCC command. If the consistency checks run ok on the standby database, the production database should be ok as it is the source of the standby.


1 Answers

Seeing that you're using SQL Server 2008, I would recommend this approach:

  • first bulkcopy your CSV files into a staging table
  • update your target table from that staging table using the MERGE command

Check out the MSDN docs and a great blog post on how to use the MERGE command.

Basically, you create a link between your actual data table and the staging table on a common criteria (e.g. a common primary key), and then you can define what to do when

  • the rows match, e.g. the row exists in both the source and the target table --> typically you'd either update some fields, or just ignore it all together
  • the row from the source doesn't exist in the target --> typically a case for an INSERT

You would have a MERGE statement something like this:

MERGE TargetTable AS t
USING SourceTable AS src
ON t.PrimaryKey = src.PrimaryKey

WHEN NOT MATCHED THEN
  INSERT (list OF fields)
  VALUES (list OF values)

WHEN MATCHED THEN
  UPDATE
    SET (list OF SET statements)
;

Of course, the ON clause can be much more involved if needed. And of course, your WHEN statements can also be more complex, e.g.

WHEN MATCHED AND (some other condition) THEN ......

and so forth.

MERGE is a very powerful and very useful new command in SQL Server 2008 - use it, if you can!

like image 97
marc_s Avatar answered Oct 14 '22 14:10

marc_s