Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster way to update 250k rows with SQL

Tags:

c#

sql

ado.net

I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).

I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).

Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)

What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?

like image 339
pablo Avatar asked Apr 23 '10 00:04

pablo


3 Answers

How slow is "slow"?

The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).

There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.

Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).

Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...

like image 189
Dean Harding Avatar answered Oct 18 '22 09:10

Dean Harding


call a stored procedure if possible

like image 27
Jon Black Avatar answered Oct 18 '22 11:10

Jon Black


If the columns updated are part of indexes you could

  • drop these indexes
  • do the update
  • re-create the indexes.

If you need these indexes to retrieve the data, well, it doesn't help.

like image 2
Luc M Avatar answered Oct 18 '22 10:10

Luc M