I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works. As an example, say there is a table called Source with two columns (Match and Data) structured as below: (The match column contains only 1's, the Data column increments by 1 for every row) Match Data `-------------------------- 1 1 1 2 1 3 1 4 That table will update another table called Destination with the same two columns structured as below: Match Data `-------------------------- 1 NULL If you want to update the ID field in Destination in the following way: <blockquote> UPDATE Destination SET Data = Source.Data FROM Destination INNER JOIN Source ON Destination.Match = Source.Match </blockquote> there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches. Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation. Thank you. P.S. I apologize for the poor formatting. Hopefully, the intent is clear.

It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last). Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest. You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get. Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.

How does sql server choose values in an update statement where there are multiple options?

Tags:

sql-server

I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works.

As an example, say there is a table called Source with two columns (Match and Data) structured as below: (The match column contains only 1's, the Data column increments by 1 for every row)
Match Data
`--------------------------
1 1
1 2
1 3
1 4

That table will update another table called Destination with the same two columns structured as below:
Match Data
`--------------------------
1 NULL

If you want to update the ID field in Destination in the following way:

UPDATE
Destination
SET
Data = Source.Data FROM
Destination
INNER JOIN
Source
ON
Destination.Match = Source.Match

there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches.

Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation.

Thank you.

P.S. I apologize for the poor formatting. Hopefully, the intent is clear.

646

asked Sep 02 '09 22:09

Fntastic

2 Answers

It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last).

Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest.

You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get.

Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.

110

answered Sep 19 '22 17:09

Joel Coehoorn

The choice is not deterministic and it can be any of the source rows.

You can try

DECLARE @Source TABLE(Match INT, Data INT);

INSERT INTO @Source
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4);

DECLARE @Destination TABLE(Match INT, Data INT);

INSERT INTO @Destination
VALUES
(1, NULL);


UPDATE Destination
SET    Data = Source.Data
FROM   @Destination Destination
       INNER JOIN @Source Source
               ON Destination.Match = Source.Match; 

SELECT *
FROM @Destination;

And look at the actual execution plan. I see the following.

enter image description here

The output columns from @Destination are Bmk1000, Match. Bmk1000 is an internal row identifier (used here due to lack of clustered index in this example) and would be different for each row emitted from @Destination (if there was more than one).

The single row is then joined onto the four matching rows in @Source and the resultant four rows are passed into a stream aggregate.

The stream aggregate groups by Bmk1000 and collapses the multiple matching rows down to one. The operation performed by this aggregate is ANY(@Source.[Data]).

The ANY aggregate is an internal aggregate function not available in TSQL itself. No guarantees are made about which of the four source rows will be chosen.

Finally the single row per group feeds into the UPDATE operator to update the row with whatever value the ANY aggregate returned.

If you want deterministic results then you can use an aggregate function yourself...

WITH GroupedSource AS
(
SELECT Match,
       MAX(Data) AS Data
FROM @Source
GROUP BY Match
)
UPDATE Destination
SET    Data = Source.Data
FROM   @Destination Destination
       INNER JOIN GroupedSource Source
               ON Destination.Match = Source.Match;

Or use ROW_NUMBER...

WITH RankedSource AS
(
SELECT Match,
      Data,
      ROW_NUMBER() OVER (PARTITION BY Match ORDER BY Data DESC) AS RN
FROM @Source
)
UPDATE Destination
SET    Data = Source.Data
FROM   @Destination Destination
       INNER JOIN RankedSource Source
               ON Destination.Match = Source.Match
WHERE RN = 1;

The latter form is generally more useful as in the event you need to set multiple columns this will ensure that all values used are from the same source row. In order to be deterministic the combination of partition by and order by columns should be unique.

answered Sep 21 '22 17:09

Martin Smith

Related questions
                            
                                Best possible ways to disable index before insert operation and enable back Index after insert
                            
                                SQL - execute query from table cells
                            
                                Microsoft.SqlServer.ManagedDTS.dll SQL Server 2016 location
                            
                                SQL Server: how to use alias in update statement?
                            
                                Importing excel files having variable headers
                            
                                JOIN statement on an alias - SQL Server
                            
                                LocalDateTime and SQL Server JDBC 4.2 driver
                            
                                Laravel boolean returns "1"/"0" instead of true/false in query
                            
                                User access log to SQL Server
                            
                                How do you convert the number you get from datepart to the name of the day?
                            
                                Encrypting Salary value in ASP .NET 2.0 and SQL Server 2005
                            
                                Getting the schema for a table
                            
                                How do I perform an exact string match on a non-case sensitive field?
                            
                                SQL Server Weighted Full Text Search
                            
                                Maximum number of workable tables in SQL Server And MySQL
                            
                                Rows Into Columns and Grouping
                            
                                SQL Server 2008 installation error: Previous releases of Microsoft Visual Studio 2008
                            
                                Upload a Massive CSV File to SQL Server Database
                            
                                Set A Variable From A Table
                            
                                System.Data.SqlTypes.SqlTypeException: SqlDateTime overflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With