I'm writing a big script, full of update statements that will run daily. Some of this updates will affect rows others won't.
I believe that update statements that won't affect anything is not very good practice, but my question is:
Thank you, Tiago
First and foremost, SELECT followed by UPDATE is almost always incorrect under concurrency. Unless things run under SERIALIZABLE isolation level, there is no guarantee that rows don't change between the SELECT and the UPDATE.
Second, for the cases when there are rows to update, the cost of SELECT + UPDATE is by definition higher than just an UPDATE.
And last, for the case when there aren't rows to update the cost of locating them by UPDATE is often the same as the cost of locating them by SELECT, so you didn't gain anything. I say 'often', not 'always', because the query optimizer may consider different strategies for update vs. select and a scan for update occurs under different locking (concurrency) rules than a scan for read.
What may make sense is to have a very cheap SELECT that can avoid an expensive UPDATE, even if not 100% accurate. The conditions between the SELECT and UPDATE can vary wildly As long as you don't get any false negatives (the SELECT say there shouldn't be be any row but UPDATE would had found row, had it been run) and the number of false positives is low (SELECT says there are rows, but the more precise/expensive UPDATE check doesn't actually find any).
Ultimately this is a problem of optimization and all optimization questions start by measuring. First locate the expensive UPDATE and then start to modify.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With