Does anyone know why sql server chooses to query the table 'building' twice? Is there any explanation? Can it be done with only one table seek?
This is the code sample:
DECLARE @id1stBuild INT = 1
,@number1stBuild INT = 2
,@idLastBuild INT = 5
,@numberLastBuild INT = 1;
DECLARE @nr TABLE (nr INT);
INSERT @nr
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
CREATE TABLE building (
id INT PRIMARY KEY identity(1, 1)
,number INT NOT NULL
,idStreet INT NOT NULL
,surface INT NOT NULL
)
INSERT INTO building (number,idStreet,surface)
SELECT bl.b
,n.nr
,abs(convert(BIGINT, convert(VARBINARY, NEWID()))) % 500
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY n1.nr) b
FROM @nr n1
CROSS JOIN @nr n2
CROSS JOIN @nr n3
) bl
CROSS JOIN @nr n
--***** execution plan for the select below
SELECT *
FROM building b
WHERE b.id = @id1stBuild
AND b.number = @number1stBuild
OR b.id = @idLastBuild
AND b.number = @numberLastBuild
DROP TABLE building
The execution plan for this is always the same: Two Clustered Index Seek unified through Merge Join (Concatenation). The rest is less important. Here is the execution plan:
Table scans on large tables take an excessive amount of time and cause performance problems. Make sure that, for any queries against large tables, at least one WHERE clause condition: refers to an indexed column and. is reasonably selective.
One common problem that exists is the lack of indexes or incorrect indexes and therefore SQL Server has to process more data to find the records that meet the queries criteria. These issues are known as Index Scans and Table Scans.
Table scan means iterate over all table rows. Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index. Usualy index scan is less expensive than a table scan because index is more flat than a table.
It's not scanning twice. It is seeking twice.
Your query is semantically the same as the below.
SELECT *
FROM building b
WHERE b.id = @id1stBuild
AND b.number = @number1stBuild
UNION
SELECT *
FROM building b
WHERE b.id = @idLastBuild
AND b.number = @numberLastBuild
And the execution plan performs two seeks and unions the result.
why is scanning done twice for the same table?
Is not a scan, is a seek, and that makes all the difference.
Implementing OR as a UNION, and then implementing the UNION via a MERGE JOIN. Is called a 'merge union':
Merge union
Now let’s change the query slightly:
select a from T where b = 1 or c = 3 |--Stream Aggregate(GROUP BY:([T].[a])) |--Merge Join(Concatenation) |--Index Seek(OBJECT:([T].[Tb]), SEEK:([T].[b]=(1)) ORDERED FORWARD) |--Index Seek(OBJECT:([T].[Tc]), SEEK:([T].[c]=(3)) ORDERED FORWARD)
Instead of the concatenation and sort distinct operators, we now have a merge join (concatenation) and a stream aggregate. What happened? The merge join (concatenation) or “merge union” is not really a join at all. It is implemented by the same iterator as the merge join, but it really performs a union all while preserving the order of the input rows. Finally, we use the stream aggregate to eliminate duplicates. (See this post for more about using stream aggregate to eliminate duplicates.) This plan is generally a better choice since the sort distinct uses memory and could spill data to disk if it runs out of memory while the stream aggregate does not use memory.
You can try the following, which gives only one seek and a slight performance improvement. As @Martin_Smith says what you have coded is the equivalent of a Union
SELECT *
FROM building b
WHERE b.id IN (@id1stBuild , @idLastBuild)
AND
(
(b.id = @id1stBuild AND b.number = @number1stBuild) OR
(b.id = @idLastBuild AND b.number = @numberLastBuild)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With