I intended to run the following UPDATE
statement on a SQL Server database table:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599
By mistake, I ran the following statement instead:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A_COPY A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
Notice that in this second statement (wrong one), the FROM
clause specifies table TABLE_A_COPY
instead of TABLE_A
. Both tables have exactly the same schema (i.e., same columns) and the same data (before any UPDATE
is executed, that is).
Both TABLE_A
and TABLE_A_COPY
have about 100 million records and the update affects about 500,000 records. The second statement (the wrong one) runs for several hours and fails while the 1st statement (the correct one) runs for 40 seconds and succeeds.
Clearly, both statements are syntactically correct, but I am not sure what exactly I asked SQL Server to do with the first statement.
My questions are:
What SQL Server was trying to do in the second statement? With my mistake I didn't specify the linkage between records from TABLE_A
to TABLE_A_COPY
, so was it trying to do a CROSS JOIN between the two, and then update each record in TABLE_A
a gazillion times?
If it isn't too broad a question to ask, what would be a valid scenario for such an UPDATE
statement in which the table being updated is not mentioned in the FROM/JOIN
clauses. Why would anyone do that? Why would SQL Server even allow that?
I did try searching for an answer to my questions, but Google seems to think I'm asking about UPDATE FROM
syntax.
Without WHERE clauses, DELETE drops all the data from the table, and UPDATE overwrites values for all the table rows.
The UPDATE command in SQL is used to modify or change the existing records in a table. If we want to update a particular value, we use the WHERE clause along with the UPDATE clause. If you do not use the WHERE clause, all the rows will be affected.
If you do not use where clause in UPDATE statement, all the records in the table will be updated with the specified values.
Just as with UPDATE , the WHERE clause in a DELETE statement is optional. If omitted, all the rows in the table will be deleted.
1) There is no connection between TABLE_A
and TABLE_A_COPY
so you will get CROSS JOIN
and massive update the same row. Result can be non-deterministic if parallel execution is involed:
LiveDemo
CREATE TABLE #TABLE_A(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_A_COPY(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_B(KEY_1 INT PRIMARY KEY, COL_1 INT, COL_2 INT);
INSERT INTO #TABLE_A VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_A_COPY VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_B VALUES (1,10,10),(2,20,20), (3,30,30);
/*
UPDATE #TABLE_A
SET COL_1=B.COL_1
--SELECT *
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
*/
UPDATE #TABLE_A
SET COL_1=B.COL_1
FROM #TABLE_A_COPY A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
SELECT *
FROM #TABLE_A;
Check in above code how TABLE_A
record with KEY_1 = 6
changed.
2)
SQL Server UPDATE FROM/DELETE FROM
syntax is much more broad than ANSI standard, the problem you encounter can be reduced to multiple update the same row. With UPDATE
you don't get any error or warning:
From Let's deprecate UPDATE FROM!
and Deprecate UPDATE FROM and DELETE FROM
:
Correctness? Bah, who cares?
Well, most do. That’s why we test.
If I mess up the join criteria in a SELECT query so that too many rows from the second table match, I’ll see it as soon as I test, because I get more rows back then expected. If I mess up the subquery criteria in an ANSI standard UPDATE query in a similar way, I see it even sooner, because SQL Server will return an error if the subquery returns more than a single value. But with the proprietary UPDATE FROM syntax, I can mess up the join and never notice – SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking. And there is no way of knowing which row that will be, since that depends in the query execution plan that happens to be chosen. A worst case scenario would be one where the execution plan just happens to result in the expected outcome during all tests on the single-processor development server – and then, after deployment to the four-way dual-core production server, our precious data suddenly hits the fan…
If you use for example MERGE
you will get error indicating:
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
So you need to be more carefull and check your code. I wish also to get error but as you see in connect link this won't happen.
One way to avoid this is to use UPDATE alias
so you are sure you use tables that take part in FROM JOIN
and no other tables are involved.:
UPDATE A
SET COL_1=B.COL_1
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With