Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UPDATE vs UPDATE WHERE

I have a table with many rows, where I periodically want to set one column to 0 using a cron.

What is faster / less memory consuming, doing an UPDATE on all rows (ie. no WHERE clause) or doing an UPDATE only WHERE mycolumn != 0?

like image 680
Eduard Luca Avatar asked Oct 21 '22 17:10

Eduard Luca


1 Answers

As noticed in comments on the original post, it depends on several things (index, database engine, type of storage media, available cache memory, etc.).

We could make an educated guess that:

a) We should always have a full-table scan unless we have an index on the column (and I'd not recommend one just for the sake of this query, as you will penalize down your online writes to benefit an offline process);

b) The "cost" for checking the values is highly amortized by the savings of not writing unchanged records (unless the database implicitly does so, which may or may not happen) if the majority of records will be zeroes.

but assumptions start to pile up, so I'd rather measure instead. To play a bit, I've:

  • Created a test table with a "status" numeric column

  • Filled it with a few million records (e.g., using a script like the one in https://stackoverflow.com/a/17268740/64635)

  • Set it up with different values, then tried to UPDATE the column to 0, with and without the WHERE.

My results (which may differ from yours) were that the WHERE query was way faster if there were indeed few non-zero records. E.g., after setting up the table with either of

UPDATE myTable SET myColumn = 1;                /* All values non-zero (1) */
UPDATE myTable SET myColumn = FLOOR(RAND()*10); /* ~90% values non-zero */

both WHERE and non-WHERE updates to 0 were slow (and no noticeable difference between them, implying "a" above is true), whereas after any of

UPDATE myTable SET myColumn = 0;                      /* All values zero */
UPDATE myTable SET myColumn = IF(id % 500 = 0, 1, 0); /* 99.8% values zero */

the UPDATE with WHERE was insanely faster (as implied by "b").

I'd recommend trying these tests (and even others, including the index if you really wish) on your setup (e.g., creating a separate table and running tests like these) and consider your data set (measure/estimate the % of records that will be non-zero when your cron job runs). Keep in mind you likely want to optimize for cost/availability (including your time as a cost) instead of finding the absolutely most performant solution in the universe (which is likely not cost-effective), and you will surely find the best solution. Good luck!

like image 123
chesterbr Avatar answered Nov 04 '22 21:11

chesterbr