Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Are the Performance Differences Between Running One vs Many Inserts

Tags:

sql

perl

I'm currently in a situation where I'm building a script that I know will need to insert multiple rows. I'm doing this in Perl, so in terms of parameterization, it's much easier to insert each row individually. In terms of speed, I'm guessing running just one insert statement will be faster (although latency will be relatively low as I'm quite close to the database itself). I'm thinking the number of rows per run of the script will be about 20-40 on average. That said, what would be the approximate performance differences between running just 1 INSERT INTO statement v.s. running one for each row? Note: The server is running SQL 2008.

[EDIT]Since there seems to be a lot of confusion, I'd like to clarify that what I'm really asking for is the theory behind how a multi-row insert is handled by SQL Server 2008. Does it essentially just convert it internally into a bunch of individual insert statements and run those over one connection, or does it do something more intelligent?

Yes, I know I can run timed loops. No, that's not what I'm asking for. [/EDIT]

like image 910
Eli Avatar asked Apr 24 '26 03:04

Eli


2 Answers

Combining multiple inserts into one command is always going to execute much more quickly than executing separate inserts. The reasons are:

  • A lot of work is done parsing the SQL - with multi version, there's only one parsing effort
  • More work is done checking permissions - again, only done once
  • Database connections are "chatty" - with multi version, handshaking only done once. You really notice this issue when using a poor network connection
  • Finally, multi version gives opportunity for server to optimize the operation
like image 179
Bohemian Avatar answered Apr 26 '26 19:04

Bohemian


There is a general idea to let the SQL database do its thing and not try to treat the database as some sort of disk read. I've seen many times where a developer will read from one table, then another, or do a general query and then run through each row to see if it's the one they want. Generally, it's better to let the SQL database do its thing.

In this case, I can't really see an advantage of doing a single vs. multiple row insert. I guess there might be some because you don't have to do multiple prepares, and commits.

It shouldn't be too difficult to actual create a temporary database and try this out. Create a database with two columns, and have the program generate data to toss into the tables. Give yourself a decent amount to do. For example, how many items will this table have? And, how many do you think you'll be inserting at once? Say create a table of 1,000,000 items, and insert into this table 1000 items at a time, 100 items at a time, and one item at a time. Just generate data using the increment operator. There may be a "sweetspot" of the number of items you can insert at once.

In my unbiased, and always correct opinion, you'll probably find that the difference isn't worth fretting over, and you should instead employ the method that makes your code the easiest to maintain.

I've have a programming dictum: The place where you want to optimize your code is probably the wrong place. We like efficiency, but we usually attack the wrong item. And, whatever we've squeezed out in terms of efficiency, we end up wasting in maintenance.

So, just program what is the easiest to understand and don't fret about being overly efficient.

like image 20
David W. Avatar answered Apr 26 '26 19:04

David W.