So I was trying to explain to some people why this query is a bad idea: <pre class="prettyprint"><code>SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales, COALESCE( (SELECT TOP (1) GroupName FROM dbo.zipGroups WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName, COALESCE( (SELECT TOP (1) GroupCode FROM dbo.zipGroups WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber FROM dbo.Report_ByZipcode AS z GROUP BY z.ReportDate, z.Zipcode </code></pre> and suggesting a better way to write it, when my boss ended the discussion with, "Well, it's been returning the right data for the last year and we haven't had any problems with it, so it's fine." At which point I thought to myself, how in the world is that even possible? After some digging, I discovered these facts: <ol> <li>This query is supposed to group sales by Zipcode and date, and link those to the largest Group (by population size) that a Zipcode is assigned to by way of the zipGroups table.</li> <li>Each Zipcode can be assigned to 0 to many Groups, and if a Zipcode is assigned to 0 Groups, it's simply not in the zipGroups table.</li> <li>A Group is a geographical area, and the GroupNumbers are ranked by largest to smallest by population (for example, the group covering the NY-NJ-CT tri-state area is GroupNumber 1, and North Platte, Nebraska is GroupNumber 209).</li> <li>The zipGroups table has not changed in at least 2 years.</li> <li>The zipGroups table has a clustered index with Zipcode, GroupNumber (ascending) as the keys.</li> <li>The combination of Zipcode, GroupNumber is unique in zipGroups.</li> </ol> So my question has 2 parts. A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY? B1) If that is true, is the query, however precariously, actually doing what it's supposed to do? B2) If that is not true, can you help me prove it? Note: I've already re-written this to use joins, so I don't need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.

Always use an order by with a TOP statement. The order is not guaranteed to be in the order of the clustered index as demonstrate in this blog post (complete with a query that disproves it): Without ORDER BY, there is no default sort order. Even if it did go by the clustered index, I wouldn't write queries that depend on undocumented behavior of the DB engine and it is better to be explicit for readability.

In SQL Server, is TOP deterministic by default when used on a table with a clustered index?

Tags:

sql

sql-server

So I was trying to explain to some people why this query is a bad idea:

SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales,
COALESCE(
  (SELECT TOP (1) GroupName
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName,
COALESCE(
  (SELECT TOP (1) GroupCode
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber
FROM dbo.Report_ByZipcode AS z
GROUP BY z.ReportDate, z.Zipcode

and suggesting a better way to write it, when my boss ended the discussion with, "Well, it's been returning the right data for the last year and we haven't had any problems with it, so it's fine."

At which point I thought to myself, how in the world is that even possible?

After some digging, I discovered these facts:

This query is supposed to group sales by Zipcode and date, and link those to the largest Group (by population size) that a Zipcode is assigned to by way of the zipGroups table.
Each Zipcode can be assigned to 0 to many Groups, and if a Zipcode is assigned to 0 Groups, it's simply not in the zipGroups table.
A Group is a geographical area, and the GroupNumbers are ranked by largest to smallest by population (for example, the group covering the NY-NJ-CT tri-state area is GroupNumber 1, and North Platte, Nebraska is GroupNumber 209).
The zipGroups table has not changed in at least 2 years.
The zipGroups table has a clustered index with Zipcode, GroupNumber (ascending) as the keys.
The combination of Zipcode, GroupNumber is unique in zipGroups.

So my question has 2 parts.

A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY?

B1) If that is true, is the query, however precariously, actually doing what it's supposed to do?

B2) If that is not true, can you help me prove it?

Note: I've already re-written this to use joins, so I don't need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.

376

asked Feb 10 '11 21:02

Jason

2 Answers

SQL Server makes no guarantees about the ordering of records in the absence of ORDER BY. It might yield the correct results 999,999 times and then fail on the millionth try. Don't do it.

164

answered Nov 15 '22 06:11

Marcelo Cantos

Always use an order by with a TOP statement. The order is not guaranteed to be in the order of the clustered index as demonstrate in this blog post (complete with a query that disproves it):

Without ORDER BY, there is no default sort order.

Even if it did go by the clustered index, I wouldn't write queries that depend on undocumented behavior of the DB engine and it is better to be explicit for readability.

answered Nov 15 '22 07:11

JohnFx

Related questions
                            
                                ms sql use like statement result in if statement
                            
                                Getting DateTimeOffset value from SQL 2008 to C#
                            
                                Complex SQL query... 3 tables and need the most popular in the last 24 hours using timestamps
                            
                                SQL Reset Identity ID in already populated table
                            
                                Best optimizing for a large SQL Server table (100-200 Mil records)
                            
                                analyze table, optimize table, how often?
                            
                                Skip first letters of all values returned from a sql server database
                            
                                How to convert newlines (replace \r\n with \n) across all varchar and nvarchar fields in a database
                            
                                The table 'dbo.UserProperties' is ambiguous. Why this error is coming?
                            
                                SQL interview question [closed]
                            
                                update multiple records in multiple nested tables in oracle
                            
                                Python doesn't save data to sqlite db
                            
                                unique pair in a "friendship" database
                            
                                Alternative strategy to query aggregation ("group by") in google app engine datastore
                            
                                TSQL, counting pairs of values in a table
                            
                                Delete All / Bulk Insert
                            
                                MySQL composite indexes and operator BETWEEN
                            
                                Mocking a MySQL server with Java
                            
                                T-SQL Group By Problem
                            
                                String.Join in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With