So I was trying to explain to some people why this query is a bad idea:
SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales,
COALESCE(
(SELECT TOP (1) GroupName
FROM dbo.zipGroups
WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName,
COALESCE(
(SELECT TOP (1) GroupCode
FROM dbo.zipGroups
WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber
FROM dbo.Report_ByZipcode AS z
GROUP BY z.ReportDate, z.Zipcode
and suggesting a better way to write it, when my boss ended the discussion with, "Well, it's been returning the right data for the last year and we haven't had any problems with it, so it's fine."
At which point I thought to myself, how in the world is that even possible?
After some digging, I discovered these facts:
So my question has 2 parts.
A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY?
B1) If that is true, is the query, however precariously, actually doing what it's supposed to do?
B2) If that is not true, can you help me prove it?
Note: I've already re-written this to use joins, so I don't need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.
SQL functions in SQLite can be either "deterministic" or "non-deterministic". A deterministic function always gives the same answer when it has the same inputs. Most built-in SQL functions in SQLite are deterministic. For example, the abs(X) function always returns the same answer as long as its input X is the same.
PRIMARY KEY and UNIQUE constraints When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index.
A clustered index is useful for range queries because the data is logically sorted on the key. You can move a table to another filegroup by recreating the clustered index on a different filegroup. You do not have to drop the table as you would to move a heap. A clustering key is a part of all nonclustered indexes.
SQL Server makes no guarantees about the ordering of records in the absence of ORDER BY. It might yield the correct results 999,999 times and then fail on the millionth try. Don't do it.
Always use an order by with a TOP statement. The order is not guaranteed to be in the order of the clustered index as demonstrate in this blog post (complete with a query that disproves it):
Without ORDER BY, there is no default sort order.
Even if it did go by the clustered index, I wouldn't write queries that depend on undocumented behavior of the DB engine and it is better to be explicit for readability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With