Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens in an UPDATE statement in which the updated table isn't mentioned in the FROM/JOIN clauses?

I intended to run the following UPDATE statement on a SQL Server database table:

UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A A
INNER JOIN TABLE_B B
   ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
  AND A.COL_1=91216599

By mistake, I ran the following statement instead:

UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A_COPY A
INNER JOIN TABLE_B B
   ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
  AND A.COL_1=91216599

Notice that in this second statement (wrong one), the FROM clause specifies table TABLE_A_COPY instead of TABLE_A. Both tables have exactly the same schema (i.e., same columns) and the same data (before any UPDATE is executed, that is).

Both TABLE_A and TABLE_A_COPY have about 100 million records and the update affects about 500,000 records. The second statement (the wrong one) runs for several hours and fails while the 1st statement (the correct one) runs for 40 seconds and succeeds.

Clearly, both statements are syntactically correct, but I am not sure what exactly I asked SQL Server to do with the first statement.

My questions are:

  1. What SQL Server was trying to do in the second statement? With my mistake I didn't specify the linkage between records from TABLE_A to TABLE_A_COPY, so was it trying to do a CROSS JOIN between the two, and then update each record in TABLE_A a gazillion times?

  2. If it isn't too broad a question to ask, what would be a valid scenario for such an UPDATE statement in which the table being updated is not mentioned in the FROM/JOIN clauses. Why would anyone do that? Why would SQL Server even allow that?


I did try searching for an answer to my questions, but Google seems to think I'm asking about UPDATE FROM syntax.

like image 838
Ricardo Avatar asked Oct 23 '15 10:10

Ricardo


People also ask

What happens when you UPDATE a table without a WHERE clause?

Without WHERE clauses, DELETE drops all the data from the table, and UPDATE overwrites values for all the table rows.

What does the UPDATE clause in an UPDATE statement do?

The UPDATE command in SQL is used to modify or change the existing records in a table. If we want to update a particular value, we use the WHERE clause along with the UPDATE clause. If you do not use the WHERE clause, all the rows will be affected.

When using the UPDATE statement if you do not specify the WHERE clause all the rows in the table are updated?

If you do not use where clause in UPDATE statement, all the records in the table will be updated with the specified values.

What happens if the WHERE clause is omitted in an UPDATE or DELETE?

Just as with UPDATE , the WHERE clause in a DELETE statement is optional. If omitted, all the rows in the table will be deleted.


1 Answers

1) There is no connection between TABLE_A and TABLE_A_COPY so you will get CROSS JOIN and massive update the same row. Result can be non-deterministic if parallel execution is involed:

LiveDemo

CREATE TABLE #TABLE_A(KEY_1 INT PRIMARY KEY,COL_1 INT);

CREATE TABLE #TABLE_A_COPY(KEY_1 INT PRIMARY KEY,COL_1 INT);

CREATE TABLE #TABLE_B(KEY_1 INT PRIMARY KEY, COL_1 INT, COL_2 INT);

INSERT INTO #TABLE_A VALUES (1,91216599),(2,91216599),(3,91216599),
                             (4,91216599),(5,91216599),(6,6);

INSERT INTO #TABLE_A_COPY VALUES (1,91216599),(2,91216599),(3,91216599),
                                 (4,91216599),(5,91216599),(6,6);    

INSERT INTO #TABLE_B VALUES (1,10,10),(2,20,20), (3,30,30);

/*
UPDATE #TABLE_A
SET COL_1=B.COL_1
--SELECT *
FROM #TABLE_A A
INNER JOIN #TABLE_B B
   ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
  AND A.COL_1=91216599;
*/

UPDATE #TABLE_A
SET COL_1=B.COL_1
FROM #TABLE_A_COPY A
INNER JOIN #TABLE_B B
   ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
  AND A.COL_1=91216599

SELECT *
FROM #TABLE_A;

Check in above code how TABLE_A record with KEY_1 = 6 changed.

2) SQL Server UPDATE FROM/DELETE FROM syntax is much more broad than ANSI standard, the problem you encounter can be reduced to multiple update the same row. With UPDATE you don't get any error or warning:

From Let's deprecate UPDATE FROM! and Deprecate UPDATE FROM and DELETE FROM :

Correctness? Bah, who cares?

Well, most do. That’s why we test.

If I mess up the join criteria in a SELECT query so that too many rows from the second table match, I’ll see it as soon as I test, because I get more rows back then expected. If I mess up the subquery criteria in an ANSI standard UPDATE query in a similar way, I see it even sooner, because SQL Server will return an error if the subquery returns more than a single value. But with the proprietary UPDATE FROM syntax, I can mess up the join and never notice – SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking. And there is no way of knowing which row that will be, since that depends in the query execution plan that happens to be chosen. A worst case scenario would be one where the execution plan just happens to result in the expected outcome during all tests on the single-processor development server – and then, after deployment to the four-way dual-core production server, our precious data suddenly hits the fan…

If you use for example MERGE you will get error indicating:

The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

So you need to be more carefull and check your code. I wish also to get error but as you see in connect link this won't happen.

One way to avoid this is to use UPDATE alias so you are sure you use tables that take part in FROM JOIN and no other tables are involved.:

UPDATE A
SET COL_1=B.COL_1
FROM #TABLE_A A
INNER JOIN #TABLE_B B
   ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
  AND A.COL_1=91216599;
like image 168
Lukasz Szozda Avatar answered Sep 24 '22 01:09

Lukasz Szozda