Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In SQL Server, is TOP deterministic by default when used on a table with a clustered index?

Tags:

sql

sql-server

So I was trying to explain to some people why this query is a bad idea:

SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales,
COALESCE(
  (SELECT TOP (1) GroupName
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName,
COALESCE(
  (SELECT TOP (1) GroupCode
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber
FROM dbo.Report_ByZipcode AS z
GROUP BY z.ReportDate, z.Zipcode

and suggesting a better way to write it, when my boss ended the discussion with, "Well, it's been returning the right data for the last year and we haven't had any problems with it, so it's fine."

At which point I thought to myself, how in the world is that even possible?

After some digging, I discovered these facts:

  1. This query is supposed to group sales by Zipcode and date, and link those to the largest Group (by population size) that a Zipcode is assigned to by way of the zipGroups table.
  2. Each Zipcode can be assigned to 0 to many Groups, and if a Zipcode is assigned to 0 Groups, it's simply not in the zipGroups table.
  3. A Group is a geographical area, and the GroupNumbers are ranked by largest to smallest by population (for example, the group covering the NY-NJ-CT tri-state area is GroupNumber 1, and North Platte, Nebraska is GroupNumber 209).
  4. The zipGroups table has not changed in at least 2 years.
  5. The zipGroups table has a clustered index with Zipcode, GroupNumber (ascending) as the keys.
  6. The combination of Zipcode, GroupNumber is unique in zipGroups.

So my question has 2 parts.

A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY?

B1) If that is true, is the query, however precariously, actually doing what it's supposed to do?

B2) If that is not true, can you help me prove it?

Note: I've already re-written this to use joins, so I don't need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.

like image 376
Jason Avatar asked Feb 10 '11 21:02

Jason


People also ask

Are SQL queries deterministic?

SQL functions in SQLite can be either "deterministic" or "non-deterministic". A deterministic function always gives the same answer when it has the same inputs. Most built-in SQL functions in SQLite are deterministic. For example, the abs(X) function always returns the same answer as long as its input X is the same.

Is clustered index automatically created on primary key?

PRIMARY KEY and UNIQUE constraints When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index.

What is the advantage of the clustered index?

A clustered index is useful for range queries because the data is logically sorted on the key. You can move a table to another filegroup by recreating the clustered index on a different filegroup. You do not have to drop the table as you would to move a heap. A clustering key is a part of all nonclustered indexes.


2 Answers

SQL Server makes no guarantees about the ordering of records in the absence of ORDER BY. It might yield the correct results 999,999 times and then fail on the millionth try. Don't do it.

like image 164
Marcelo Cantos Avatar answered Nov 15 '22 06:11

Marcelo Cantos


Always use an order by with a TOP statement. The order is not guaranteed to be in the order of the clustered index as demonstrate in this blog post (complete with a query that disproves it):

Without ORDER BY, there is no default sort order.

Even if it did go by the clustered index, I wouldn't write queries that depend on undocumented behavior of the DB engine and it is better to be explicit for readability.

like image 44
JohnFx Avatar answered Nov 15 '22 07:11

JohnFx