Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid sort operator in index plan

I have two tables [LogTable] and [LogTable_Cross].

Below is the schema and script to populate them:

 --Main Table

 CREATE TABLE [dbo].[LogTable]
    (
      [LogID] [int] NOT NULL
                    IDENTITY(1, 1) ,
      [DateSent] [datetime] NULL,
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable] ADD CONSTRAINT [PK_LogTable] PRIMARY KEY CLUSTERED  ([LogID]) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO


--Cross table

 CREATE TABLE [dbo].[LogTable_Cross]
    (
      [LogID] [int] NOT NULL ,
      [UserID] [int] NOT NULL
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable_Cross] WITH NOCHECK ADD CONSTRAINT [FK_LogTable_Cross_LogTable] FOREIGN KEY ([LogID]) REFERENCES [dbo].[LogTable] ([LogID])
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]
 ON [dbo].[LogTable_Cross] ([UserID])
 INCLUDE ([LogID])
GO


-- Script to populate them
 INSERT INTO [LogTable]
        SELECT TOP 100000
                DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)
        FROM    sys.sysobjects
                CROSS JOIN sys.all_columns


 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                1
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                2
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                3
        FROM    [LogTable]
        ORDER BY NEWID()


GO

I want to select all those logs (from LogTable) which has given userid (user id will be checked from cross table LogTable_Cross) with datesent desc.

SELECT  DI.LogID              
FROM    LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

After running this query here is my execution plan: enter image description here

As you can see there is a sort operator coming in role and that should be probably because of following line "ORDER BY DateSent DESC"

My question is that why that Sort operator is coming in the plan even though I have the following index applied on the table

GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO

On the other hand if I remove the join and write the query in this way:

SELECT  DI.LogID              
FROM    LogTable DI              
  --      INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        --WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

the plan changes to

enter image description here

i.e Sort operator is removed and the plan is showing that my query is using my non clustered index.

So is that a way to remove "Sort" operator in the plan for my query even if I am using join.

EDIT:

I went further and limited the "Max Degree of Parallelism" to 1

enter image description here

Ran the following query again:

SELECT  DI.LogID              
FROM    LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

and the plan is still having that Sort operator:

enter image description here

Edit 2

Even if I have the following index as suggested:

 CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID_2]
 ON [dbo].[LogTable_Cross] ([UserID], [LogID])

the plan is still having the Sort operator: enter image description here

like image 472
Raghav Avatar asked Apr 19 '17 08:04

Raghav


People also ask

How do I stop sorting in SQL Server?

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. Using OFFSET 0 ROWS and no FETCH clause means return everything. Query the view again without an ORDER BY clause: SELECT * FROM Sales.

How can reduce sort cost in execution plan in SQL Server?

For this reason, we can use indexes to eliminate the costly sort operations in the queries. However, using indexes can decrease the performance of the insert, update and delete statements and they also increase disk space usage of the database files.

Does index sort order matter?

For composite indixes, ordering matters. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 } . For a query to use a compound index for a sort, the specified sort direction for all keys in the cursor.

Why is sorting order by expensive?

Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times.


2 Answers

The second query of yours does not contain the UserId condition and therefore it is not an equivalent query. The reason why the first query is not covered by your indexes on LogTable is the fact, that UserId is not present in them (and you need to perform the join as well). Therefore, SQL Server has to join the tables (Hash Join, Merge Join or Nested-Loop join). SQL Server correctly selects the Hash Join, since the intermediate results are large and they are not sorted according to the LogID. If you give them the intermediate result sorted according to the LogID (your second edit) then he uses merge join, however, sort according to the DateSend is stil needed. The only solution without sort is to create an indexed materialized view:

CREATE VIEW vLogTable
WITH SCHEMABINDING
AS
   SELECT  DI.LogID, DI.DateSent, DP.UserID           
   FROM dbo.LogTable DI              
   INNER JOIN dbo.LogTable_Cross DP ON DP.LogID = DI.LogID  

CREATE UNIQUE CLUSTERED INDEX CIX_vCustomerOrders 
   ON dbo.vLogTable(UserID, DateSent, LogID);

The view has to be used with noexpand hint, so the optimizer can find the CIX_vCustomerOrders index:

SELECT  LogID              
FROM dbo.vLogTable   WITH(NOEXPAND)
    WHERE  UserID = 1  
ORDER BY DateSent DESC

This query is equivalent query to your first query. You may check the correctness if you insert the following row:

INSERT INTO LogTable VALUES (CURRENT_TIMESTAMP)

then my query still returns the correct result (10000 rows), however, your second query returns 10001 rows. You may try to delete or insert some other rows and the view will still be up-to-date and you recieve correct results from my query.

like image 171
Radim Bača Avatar answered Oct 13 '22 23:10

Radim Bača


You have sort operation when you have the join because of the parallelism in the previous steps. When SQL Server processes the records in multiple threads, the order is not determined anymore. Each thread just pushes the results to the next item in the pipeline (Hash match in your case).

Since the order is not determined and you are asking for an order, SQL Server has to sort the result.

You can try to add the MAXDOP = 1 hint to force SQL Server to run the query using only one thread. This might help in this case, but can cause performance degradation too.

The second query can be satisfied using an index scan and the index is ordered and that order is the same as the requested one. The records (keys) in the index are ordered by definition. SQL Server guessed that running the query on one thread and just reading the data using the index is more beneficial than reading the data using multiple threads and sorting them later.

like image 43
Pred Avatar answered Oct 14 '22 00:10

Pred