I have a for loop that goes through millions of objects. What would be the suggested way to commit this? Here are a few examples I was thinking of:
# after each
for item in items:
cursor.execute()
conn.commit()
# at the end
for item in items:
cursor.execute()
conn.commit()
# after N items
for n, item in enumerate(items):
cursor.execute()
if n % N == 0:
conn.commit()
conn.commit()
Which of the above would be the most efficient?
You're asking, should I commit ...
First of all, if you're working with MyISAM, don't worry about it. MyISAM doesn't implement transactions.
If your system won't be corrupted by committing only some of your row changes, you should commit after every N items. That's because committing after every item will slow down your processing a lot. Only committing once at the end of the huge operation is likely either to blow out your rollback space, or take an unspeakably long time during which your other data base users will have slowdowns.
I've been through this countless times. Any N greater than about 20 will do fine.
Arguably the "at the end" one, since you BEGIN
and COMMIT
only once and it's one single transaction with its own scope. It's easier from concurrency point of view: basically the transaction says: this is my table now, don't touch it anyone.
If you commit multiple times (the other two solutions), you BEGIN
and COMMIT
a transaction many times (there's a next transaction started right after you commit). This means more chance of interruption from other ongoing DB operations. Also - those operations take time themselves.
You should run a benchmark simulating your use-case, though. I'd be curious to know if depending on some conditions (number of rows to insert, session configuration, data type, indexes used) one or the other solution may prevail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With