I have two tables [LogTable] and [LogTable_Cross].
Below is the schema and script to populate them:
--Main Table
CREATE TABLE [dbo].[LogTable]
(
[LogID] [int] NOT NULL
IDENTITY(1, 1) ,
[DateSent] [datetime] NULL,
)
ON [PRIMARY]
GO
ALTER TABLE [dbo].[LogTable] ADD CONSTRAINT [PK_LogTable] PRIMARY KEY CLUSTERED ([LogID]) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO
--Cross table
CREATE TABLE [dbo].[LogTable_Cross]
(
[LogID] [int] NOT NULL ,
[UserID] [int] NOT NULL
)
ON [PRIMARY]
GO
ALTER TABLE [dbo].[LogTable_Cross] WITH NOCHECK ADD CONSTRAINT [FK_LogTable_Cross_LogTable] FOREIGN KEY ([LogID]) REFERENCES [dbo].[LogTable] ([LogID])
GO
CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]
ON [dbo].[LogTable_Cross] ([UserID])
INCLUDE ([LogID])
GO
-- Script to populate them
INSERT INTO [LogTable]
SELECT TOP 100000
DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)
FROM sys.sysobjects
CROSS JOIN sys.all_columns
INSERT INTO [LogTable_Cross]
SELECT [LogID] ,
1
FROM [LogTable]
ORDER BY NEWID()
INSERT INTO [LogTable_Cross]
SELECT [LogID] ,
2
FROM [LogTable]
ORDER BY NEWID()
INSERT INTO [LogTable_Cross]
SELECT [LogID] ,
3
FROM [LogTable]
ORDER BY NEWID()
GO
I want to select all those logs (from LogTable) which has given userid (user id will be checked from cross table LogTable_Cross) with datesent desc.
SELECT DI.LogID
FROM LogTable DI
INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID
WHERE DP.UserID = 1
ORDER BY DateSent DESC
After running this query here is my execution plan:
As you can see there is a sort operator coming in role and that should be probably because of following line "ORDER BY DateSent DESC"
My question is that why that Sort operator is coming in the plan even though I have the following index applied on the table
GO
CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO
On the other hand if I remove the join and write the query in this way:
SELECT DI.LogID
FROM LogTable DI
-- INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID
--WHERE DP.UserID = 1
ORDER BY DateSent DESC
the plan changes to
i.e Sort operator is removed and the plan is showing that my query is using my non clustered index.
So is that a way to remove "Sort" operator in the plan for my query even if I am using join.
EDIT:
I went further and limited the "Max Degree of Parallelism" to 1
Ran the following query again:
SELECT DI.LogID
FROM LogTable DI
INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID
WHERE DP.UserID = 1
ORDER BY DateSent DESC
and the plan is still having that Sort operator:
Edit 2
Even if I have the following index as suggested:
CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID_2]
ON [dbo].[LogTable_Cross] ([UserID], [LogID])
the plan is still having the Sort operator:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. Using OFFSET 0 ROWS and no FETCH clause means return everything. Query the view again without an ORDER BY clause: SELECT * FROM Sales.
For this reason, we can use indexes to eliminate the costly sort operations in the queries. However, using indexes can decrease the performance of the insert, update and delete statements and they also increase disk space usage of the database files.
For composite indixes, ordering matters. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 } . For a query to use a compound index for a sort, the specified sort direction for all keys in the cursor.
Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times.
The second query of yours does not contain the UserId condition and therefore it is not an equivalent query. The reason why the first query is not covered by your indexes on LogTable is the fact, that UserId is not present in them (and you need to perform the join as well). Therefore, SQL Server has to join the tables (Hash Join, Merge Join or Nested-Loop join). SQL Server correctly selects the Hash Join, since the intermediate results are large and they are not sorted according to the LogID. If you give them the intermediate result sorted according to the LogID (your second edit) then he uses merge join, however, sort according to the DateSend is stil needed. The only solution without sort is to create an indexed materialized view:
CREATE VIEW vLogTable
WITH SCHEMABINDING
AS
SELECT DI.LogID, DI.DateSent, DP.UserID
FROM dbo.LogTable DI
INNER JOIN dbo.LogTable_Cross DP ON DP.LogID = DI.LogID
CREATE UNIQUE CLUSTERED INDEX CIX_vCustomerOrders
ON dbo.vLogTable(UserID, DateSent, LogID);
The view has to be used with noexpand hint, so the optimizer can find the CIX_vCustomerOrders index:
SELECT LogID
FROM dbo.vLogTable WITH(NOEXPAND)
WHERE UserID = 1
ORDER BY DateSent DESC
This query is equivalent query to your first query. You may check the correctness if you insert the following row:
INSERT INTO LogTable VALUES (CURRENT_TIMESTAMP)
then my query still returns the correct result (10000 rows), however, your second query returns 10001 rows. You may try to delete or insert some other rows and the view will still be up-to-date and you recieve correct results from my query.
You have sort operation when you have the join because of the parallelism in the previous steps. When SQL Server processes the records in multiple threads, the order is not determined anymore. Each thread just pushes the results to the next item in the pipeline (Hash match in your case).
Since the order is not determined and you are asking for an order, SQL Server has to sort the result.
You can try to add the MAXDOP = 1
hint to force SQL Server to run the query using only one thread. This might help in this case, but can cause performance degradation too.
The second query can be satisfied using an index scan and the index is ordered and that order is the same as the requested one. The records (keys) in the index are ordered by definition. SQL Server guessed that running the query on one thread and just reading the data using the index is more beneficial than reading the data using multiple threads and sorting them later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With