The order of the columns in a composite index does matter on how a query against a table will use it or not. A query will use a composite index only if the where clause of the query has at least the leading/left most columns of the index in it.
If you have ever seen any index creation syntax, you may notice that there is sort order in index column. Many often wonder if it really matters for the performance of query or not. Let me give you one-word answer – yes it does matter in some specific scenarios and let us see the example.
A clustered index may be the fastest for one SELECT statement but it may not necessarily be correct choice. SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page.
The ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.
This primarily matters when used with composite indexes:
CREATE INDEX ix_index ON mytable (col1, col2 DESC);
can be used for either:
SELECT *
FROM mytable
ORDER BY
col1, col2 DESC
or:
SELECT *
FROM mytable
ORDER BY
col1 DESC, col2
, but not for:
SELECT *
FROM mytable
ORDER BY
col1, col2
An index on a single column can be efficiently used for sorting in both ways.
See the article in my blog for details:
Update:
In fact, this can matter even for a single column index, though it's not so obvious.
Imagine an index on a column of a clustered table:
CREATE TABLE mytable (
pk INT NOT NULL PRIMARY KEY,
col1 INT NOT NULL
)
CREATE INDEX ix_mytable_col1 ON mytable (col1)
The index on col1
keeps ordered values of col1
along with the references to rows.
Since the table is clustered, the references to rows are actually the values of the pk
. They are also ordered within each value of col1
.
This means that that leaves of the index are actually ordered on (col1, pk)
, and this query:
SELECT col1, pk
FROM mytable
ORDER BY
col1, pk
needs no sorting.
If we create the index as following:
CREATE INDEX ix_mytable_col1_desc ON mytable (col1 DESC)
, then the values of col1
will be sorted descending, but the values of pk
within each value of col1
will be sorted ascending.
This means that the following query:
SELECT col1, pk
FROM mytable
ORDER BY
col1, pk DESC
can be served by ix_mytable_col1_desc
but not by ix_mytable_col1
.
In other words, the columns that constitute a CLUSTERED INDEX
on any table are always the trailing columns of any other index on that table.
For a true single column index it makes little difference from the Query Optimiser's point of view.
For the table definition
CREATE TABLE T1( [ID] [int] IDENTITY NOT NULL,
[Filler] [char](8000) NULL,
PRIMARY KEY CLUSTERED ([ID] ASC))
The Query
SELECT TOP 10 *
FROM T1
ORDER BY ID DESC
Uses an ordered scan with scan direction BACKWARD
as can be seen in the Execution Plan. There is a slight difference however in that currently only FORWARD
scans can be parallelised.
However it can make a big difference in terms of logical fragmentation. If the index is created with keys descending but new rows are appended with ascending key values then you can end up with every page out of logical order. This can severely impact the size of the IO reads when scanning the table and it is not in cache.
See the fragmentation results
avg_fragmentation avg_fragment
name page_count _in_percent fragment_count _size_in_pages
------ ------------ ------------------- ---------------- ---------------
T1 1000 0.4 5 200
T2 1000 99.9 1000 1
for the script below
/*Uses T1 definition from above*/
SET NOCOUNT ON;
CREATE TABLE T2( [ID] [int] IDENTITY NOT NULL,
[Filler] [char](8000) NULL,
PRIMARY KEY CLUSTERED ([ID] DESC))
BEGIN TRAN
GO
INSERT INTO T1 DEFAULT VALUES
GO 1000
INSERT INTO T2 DEFAULT VALUES
GO 1000
COMMIT
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('T1'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('T2'), 1, NULL, 'DETAILED')
WHERE index_level = 0
It's possible to use the spatial results tab to verify the supposition that this is because the later pages have ascending key values in both cases.
SELECT page_id,
[ID],
geometry::Point(page_id, [ID], 0).STBuffer(4)
FROM T1
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
UNION ALL
SELECT page_id,
[ID],
geometry::Point(page_id, [ID], 0).STBuffer(4)
FROM T2
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
The sort order matters when you want to retrieve lots of sorted data, not individual records.
Note that (as you are suggesting with your question) the sort order is typically far less significant than what columns you are indexing (the system can read the index in reverse if the order is opposite what it wants). I rarely give index sort order any thought, whereas I agonize over the columns covered by the index.
@Quassnoi provides a great example of when it does matter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With