Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - why is scanning done twice for the same table?

Does anyone know why sql server chooses to query the table 'building' twice? Is there any explanation? Can it be done with only one table seek?

This is the code sample:

DECLARE @id1stBuild INT = 1
    ,@number1stBuild INT = 2
    ,@idLastBuild INT = 5
    ,@numberLastBuild INT = 1;
DECLARE @nr TABLE (nr INT);

INSERT @nr
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

CREATE TABLE building (
    id INT PRIMARY KEY identity(1, 1)
    ,number INT NOT NULL
    ,idStreet INT NOT NULL
    ,surface INT NOT NULL
    )

INSERT INTO building (number,idStreet,surface)
SELECT bl.b
    ,n.nr
    ,abs(convert(BIGINT, convert(VARBINARY, NEWID()))) % 500
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY n1.nr) b
    FROM @nr n1
    CROSS JOIN @nr n2
    CROSS JOIN @nr n3
    ) bl
CROSS JOIN @nr n

--***** execution plan for the select below
SELECT *
FROM building b
WHERE b.id = @id1stBuild
    AND b.number = @number1stBuild
    OR b.id = @idLastBuild
    AND b.number = @numberLastBuild

DROP TABLE building

The execution plan for this is always the same: Two Clustered Index Seek unified through Merge Join (Concatenation). The rest is less important. Here is the execution plan:

enter image description here

like image 708
Emarian Avatar asked Jan 19 '15 11:01

Emarian


People also ask

What are some of the problems of doing table scans in a database?

Table scans on large tables take an excessive amount of time and cause performance problems. Make sure that, for any queries against large tables, at least one WHERE clause condition: refers to an indexed column and. is reasonably selective.

What causes table scan in SQL Server?

One common problem that exists is the lack of indexes or incorrect indexes and therefore SQL Server has to process more data to find the records that meet the queries criteria. These issues are known as Index Scans and Table Scans.

What is the difference between table scan and index scan?

Table scan means iterate over all table rows. Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index. Usualy index scan is less expensive than a table scan because index is more flat than a table.


Video Answer


3 Answers

It's not scanning twice. It is seeking twice.

Your query is semantically the same as the below.

SELECT *
FROM   building b
WHERE  b.id = @id1stBuild
       AND b.number = @number1stBuild
UNION
SELECT *
FROM   building b
WHERE  b.id = @idLastBuild
       AND b.number = @numberLastBuild 

And the execution plan performs two seeks and unions the result.

like image 184
Martin Smith Avatar answered Oct 13 '22 01:10

Martin Smith


why is scanning done twice for the same table?

Is not a scan, is a seek, and that makes all the difference.

Implementing OR as a UNION, and then implementing the UNION via a MERGE JOIN. Is called a 'merge union':

Merge union

Now let’s change the query slightly:

select a from T where b = 1 or c = 3

  |--Stream Aggregate(GROUP BY:([T].[a]))
   |--Merge Join(Concatenation)
        |--Index Seek(OBJECT:([T].[Tb]), SEEK:([T].[b]=(1)) ORDERED FORWARD)
        |--Index Seek(OBJECT:([T].[Tc]), SEEK:([T].[c]=(3)) ORDERED FORWARD)

Instead of the concatenation and sort distinct operators, we now have a merge join (concatenation) and a stream aggregate. What happened? The merge join (concatenation) or “merge union” is not really a join at all. It is implemented by the same iterator as the merge join, but it really performs a union all while preserving the order of the input rows. Finally, we use the stream aggregate to eliminate duplicates. (See this post for more about using stream aggregate to eliminate duplicates.) This plan is generally a better choice since the sort distinct uses memory and could spill data to disk if it runs out of memory while the stream aggregate does not use memory.

like image 25
Remus Rusanu Avatar answered Oct 13 '22 02:10

Remus Rusanu


You can try the following, which gives only one seek and a slight performance improvement. As @Martin_Smith says what you have coded is the equivalent of a Union

SELECT *
FROM building b
WHERE b.id IN (@id1stBuild , @idLastBuild) 
    AND 
        (
            (b.id = @id1stBuild AND b.number = @number1stBuild) OR 
            (b.id = @idLastBuild AND b.number = @numberLastBuild)
        )
like image 27
Steve Ford Avatar answered Oct 13 '22 02:10

Steve Ford