Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better to commit inside or outside a loop?

Tags:

database

Maybe there's no simple answer to this question, but I ask in case someone has, if not a simple answer, at least an insight.

I've had a number of occasions where I create a loop that goes through many records in a database table performing some update, and where I could legitimately do one big commit at the end, or commit each record as I processed it. i.e. committing one at a time would not create any data integrity issues.

Is there a clear case for which is better?

What brings it to mind is that I had one such program that I recently switched from a single big commit to a bunch of little commits because it was a fairly long running program -- about 80 minutes -- and it failed half way through on bad data. I fixed the problem and re-ran, but then it had to start over again from the beginning when I could have had it just process the previously unprocessed records.

I noticed when I made this change that the run time was about the same either way.

like image 409
Jay Avatar asked Oct 08 '10 17:10

Jay


People also ask

What is the difference between commit in loop and commit outside loop?

Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.

Which is faster to commit in a loop or batch?

Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.

Is the commit inside a loop slowing down the system?

My point (not very well made) was that the commit inside a loop is probably not the main problem - the pl/sql may not wait for the commit - yes if its a busy system then we are probably slowing other things down. But look to see where the time is going before you blame the commit - which is a grubby habit

Can you commit to multiple rows in a for loop?

If you do commit inside the for loop, regardless of the number of rows between commit, you must close and reopen the cursor - selecting only unprocessed rows. That is not true. No it is not technically true but ajallen has a valid point.


2 Answers

Assuming that the ability to rollback the entire persistence is not needed (in which case there is only one answer; commit outside), committing inside the loop keeps the transaction log smaller, but requires more roundtrips to the DB. Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.

like image 159
KeithS Avatar answered Oct 05 '22 17:10

KeithS


I think the answer is do you need to rollback all if one fails? If yes, put the transaction outside, otherwise put it inside. Of course I almost never would write a loop to do an update anyway except to process fairly large batches of records. If you are doing row-by-row updates, there are better, more performant methods.

like image 45
HLGEM Avatar answered Oct 05 '22 17:10

HLGEM