Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do clustered index on a column GUARANTEES returning sorted rows according to that column [duplicate]

Tags:

sql

sql-server

I am unable to get clear cut answers on this contentious question . MSDN documentation mentions

Clustered

  • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

  • The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

While I see most of the answers

  • Does a SELECT query always return rows in the same order? Table with clustered index
  • http://sqlwithmanoj.com/2013/06/02/clustered-index-do-not-guarantee-physically-ordering-or-sorting-of-rows/

answering negative.

What is it ?

like image 540
Mudassir Hasan Avatar asked Dec 05 '22 04:12

Mudassir Hasan


2 Answers

Just to be clear. Presumably, you are talking about a simple query such as:

select *
from table t;

First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by, the ordering in the result set is a consequence of the query plan.

That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.

What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:

  • Single threaded server.
  • Single file filegroup
  • No competing indexes
  • No table partitions

I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.

Even on a small table, you can get surprises. Consider:

select NonPrimaryKeyColumn
from table

The query plan would probably decide to use an index on table(NonPrimaryKeyColumn) rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by or where filtering.

If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by have no explicit ordering.

And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by is redundant because of the rest of the plan, then the order by will not result in a sort.

And, of course you want to guarantee the ordering of results, you need order by in the outermost query. Even a query like this:

select *
from (select top 100 t.* from t order by col1) t

Does not guarantee that the results are ordered in the final result set. You really need to do:

select *
from (select top 100 t.* from t order by col1) t
order by col1;

to guarantee the results in a particular order. This behavior is documented here.

like image 142
Gordon Linoff Avatar answered Dec 06 '22 17:12

Gordon Linoff


Without ORDER BY, there is no default sort order even if you have clustered index

in this link there is a good example :

CREATE SCHEMA Data AUTHORIZATION dbo 
GO 
CREATE TABLE Data.Numbers(Number INT NOT NULL PRIMARY KEY) 
GO 
DECLARE @ID INT; 
SET NOCOUNT ON; 
SET @ID = 1; 
WHILE @ID < 100000 BEGIN 
 INSERT INTO Data.Numbers(Number) 
 SELECT @ID; 
 SET @ID = @ID+1; 
END 

CREATE TABLE Data.WideTable(ID INT NOT NULL  
CONSTRAINT PK_WideTable PRIMARY KEY, 
RandomInt INT NOT NULL, 
CHARFiller CHAR(1000)) 
GO 
CREATE VIEW dbo.WrappedRand 
AS 
SELECT RAND() AS random_value 
GO 
CREATE ALTER FUNCTION dbo.RandomInt() 
RETURNS INT 
AS 
BEGIN 
DECLARE @ret INT; 
SET @ret = (SELECT random_value*1000000 FROM dbo.WrappedRand); 
RETURN @ret; 
END 
GO 

INSERT INTO Data.WideTable(ID,RandomInt,CHARFiller) 
SELECT Number, dbo.RandomInt(), 'asdf' 
FROM Data.Numbers 
GO 
CREATE INDEX WideTable_RandomInt ON Data.WideTable(RandomInt) 
GO 
SELECT TOP 100 ID FROM Data.WideTable 

OUTPUT:

1407
253
9175
6568
4506
1623
581

As you have seen, the optimizer has chosen to use a non-clustered index to satisfy this SELECT TOP query.

Clearly you cannot assume that your results are ordered unless you explicitly use ORDER BY clause.

like image 27
nil Avatar answered Dec 06 '22 17:12

nil