Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORDER and filter by add date while CLUSTERED IDENTITY PRIMARY KEY on row Id

I have a table (Orders) with CLUSTERED IDENTITY PRIMARY KEY (OrderId) and I am filtering and sorting the data by add date column (AddDate). Is there a way to tell the query optimizer that the AddDate is ordered the same way the OrderId is (so the data is ordered by AddDate already)?

SQL Server does not really need to Scan whole table and then sort it. All the operation requires is to Scan the table until end date is found, then filter out data before start date and return it as is (without sorting it).

Example:

SELECT
      *
    FROM Orders
    WHERE AddDate BETWEEN @FromDate AND @ToDate
    ORDER BY AddDate
like image 273
Jozef Babinsky Avatar asked Oct 02 '18 11:10

Jozef Babinsky


People also ask

Does order matter for clustered index?

Does the order of columns in a PK index matter? Yes it does. By default, the primary key constraint is enforced in SQL Server by a unique clustered index. The clustered index defines the logical order of rows in the table.

Can you have a primary key and a clustered index?

PRIMARY KEY and UNIQUE constraints When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.

Why you should never use GUIDs as part of Clustered index?

The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

Should primary key always be clustered index?

Nope, it can be nonclustered. However, if you don't explicitly define it as nonclustered and there is no clustered index on the table, it'll be created as clustered.


1 Answers

Is there a way to tell the query optimizer that the AddDate is ordered the same way the OrderId is (so the data is ordered by AddDate already)?

No, there isn't any way to do that.

However you can order by OrderId instead of by AddDate, if AddDate is ordered the same way as OrderId, it would return the same results. But unfortunatly, SQL Server would scan the entire table anyway.

Let't take Northwind Orders tables and OrderDate column.

The query:

SELECT *
FROM dbo.Orders
WHERE OrderDate BETWEEN '1997-12-10' AND '1998-03-05'
ORDER BY OrderDate

Produces this plan. It scans the clustered index entirely while applying a filter, then order the result.

The query:

SELECT *
FROM dbo.Orders
WHERE OrderDate BETWEEN '1997-12-10' AND '1997-12-17'
ORDER BY OrderId -- It's equivalent to ordering by OrderDate

Produces this plan. It also scans the clustered index entirely and apply a filter, but it doesn't order.

Having OrderDate as the clustered index key would drastically improve the performance of the query, but you might not want to have such a clustered index key. However you cloud create a covering index that would also drastically improve the performance:

CREATE INDEX IX_Orders_OrderDate ON dbo.Orders(OrderDate)
INCLUDE ([OrderID], [CustomerID], [EmployeeID], [RequiredDate], [ShippedDate], [ShipVia], [Freight], [ShipName], [ShipAddress], [ShipCity], [ShipRegion], [ShipPostalCode], [ShipCountry])

The query:

SELECT *
FROM dbo.Orders
WHERE OrderDate BETWEEN '1997-12-10' AND '1998-03-05'
ORDER BY OrderDate

Produces this plan. It just seeks the index. It cannot be faster.

But that index is fat, it would penalize data modifications.

However, you could take advantage of a thinner index like the following:

CREATE INDEX IX_Orders_OrderDate ON dbo.Orders(OrderDate, OrderId)

Using a query like this:

DECLARE @FromOrderId int, @ToOrderId int;
SELECT TOP (1) @FromOrderId = OrderId FROM dbo.Orders WHERE OrderDate <= '1997-12-10' ORDER BY OrderDate DESC, OrderId DESC;
SELECT TOP (1) @ToOrderId = OrderId FROM dbo.Orders WHERE OrderDate >= '1998-03-05' ORDER BY OrderDate ASC, OrderId ASC;

SELECT *
FROM dbo.Orders
WHERE 
    (OrderId  >= @FromOrderId OR @FromOrderId IS NULL)
    AND (OrderId <= @ToOrderId OR @ToOrderId IS NULL)
ORDER BY OrderID
OPTION (RECOMPILE)

It produces this plan. It just need 3 seeks to solve the query.

like image 158
Jesús López Avatar answered Nov 17 '22 21:11

Jesús López