Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ORDER BY and ROW_NUMBER() deterministic?

I've used SQL in couple databases engines from time to time several years but have little theoretical knowledge so my question could be very "noobish" for some of you. But it become important to me now so I have to ask.

Imagine table Urls with non unique column status. And for the question assume that we have large amount of rows and status has the same value in every record.

And imagine we execute many times query:

SELECT * FROM Urls ORDER BY status
  1. Do we get every time the same row order or not? If we do what will happen if we add some new rows? Does it change order or new records will be appended to end of the results? And if we don't get the same order - on what conditions depend this order?

  2. Do ROW_NUMBER() OVER (ORDER BY status) will return the same order as query above or it is based on different ordering mechanism?

like image 591
MKB Avatar asked Sep 04 '13 11:09

MKB


People also ask

Is SQL order deterministic?

Each built-in function is deterministic or nondeterministic based on how the function is implemented by SQL Server. For example, specifying an ORDER BY clause in a query doesn't change the determinism of a function that is used in that query. All of the string built-in functions are deterministic, except for FORMAT.

Does ROW_NUMBER need order by?

The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. It is required.

Is ROW_NUMBER faster than group by?

The group by should be faster. The row number has to assign a row to all rows in the table.

What is the difference between rank and ROW_NUMBER in SQL?

The difference between RANK() and ROW_NUMBER() is that RANK() skips duplicate values. When there are duplicate values, the same ranking is assigned, and a gap appears in the sequence for each duplicate ranking.


2 Answers

It's very simple. If you want an ordering that you can rely upon, then you need to include enough columns in your ORDER BY clause such that the combination of all of those columns is unique for each row. Nothing else is guaranteed.

For a single table, you can usually get what you want by listing the columns that are "interesting" to sort by and then including the primary key column(s) afterwards. Since the PK, by itself, guarantees uniqueness, the whole combination is also guaranteed to uniquely define the ordering, e.g. If the Urls table has a primary key of {Site, Page, Ordinal} then the following would give you a dependable result:

SELECT * FROM Urls ORDER BY status, Site, Page, Ordinal
like image 163
Damien_The_Unbeliever Avatar answered Sep 30 '22 05:09

Damien_The_Unbeliever


ORDER BY is not stable in SQL Server (nor in any other database, as far as I know). A stable sort is one that returns records in the same order that they are found in the table.

The high-level reason is quite simple. Tables are sets. They have no order. So a "stable" sort just doesn't make sense.

The lower-level reasons are probably more important. The database could be implementing a parallel sort algorithm. Such algorithms are not, by default, stable.

If you want a stable sort, then include a key column in the sorting.

This is alluded to in the documentation:

To achieve stable results between query requests using OFFSET and FETCH, the following conditions must be met:

The underlying data that is used by the query must not change. That is, either the rows touched by the query are not updated or all requests for pages from the query are executed in a single transaction using either snapshot or serializable transaction isolation. For more information about these transaction isolation levels, see SET TRANSACTION ISOLATION LEVEL (Transact-SQL).

The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.

like image 44
Gordon Linoff Avatar answered Sep 30 '22 07:09

Gordon Linoff