Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to insert in parallel to a single table

My company is cursed by a symbiotic partnership turned parasitic. To get our data from the parasite, we have to use a painfully slow odbc connection. I did notice recently though that I can get more throughput by running queries in parallel (even on the same table).

There is a particularly large table that I want to extract data from and move it into our local table. Running queries in parallel I can get data faster, but I also imagine that this could cause issues with trying to write data from multiple queries into the same table at once.

What advice can you give me on how to best handle this situation so that I can take advantage of the increased speed of using queries in parallel?

EDIT: I've gotten some great feedback here, but I think I wasn't completely clear on the fact that I'm pulling data via a linked server (which uses the odbc drivers). In other words that means I can run normal INSERT statements and I believe that would provide better performance than either SqlBulkCopy or BULK INSERT (actually, I don't believe BULK INSERT would even be an option).

like image 714
Brandon Moore Avatar asked Jun 19 '12 02:06

Brandon Moore


People also ask

How can I speed up my insert query?

You can use the following methods to speed up inserts: If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

How can I make SQL Server insert query faster?

The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.

Can we insert data into two tables simultaneously?

You can not insert data into 2 tables simultaneously in a single session. But if u split the insert statements into 2 statements, it is going to give you the same effect!


1 Answers

Have you read Load 1TB in less than 1 hour?

  1. Run as many load processes as you have available CPUs. If you have 32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel loads.
  2. If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel. Also make sure all records belong to one partition if you want to use the switch partition strategy.
  3. Use BULK insert instead of BCP if you are running the process on the SQL Server machine.
  4. Use table partitioning to gain another 8-10%, but only if your input files are GUARANTEED to match your partitioning function, meaning that all records in one file must be in the same partition.
  5. Use TABLOCK to avoid row at a time locking.
  6. Use ROWS PER BATCH = 2500, or something near this if you are importing multiple streams into one table.

For SQL Server 2008, there are certain circumstances where you can utilize minimal logging for a standard INSERT SELECT:

SQL Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition, turning on trace flag 610 lets SQL Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.

like image 95
8kb Avatar answered Sep 27 '22 22:09

8kb