Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ROW_NUMBER vs IDENTITY and ORDER BY

Is there any difference (in terms of result set, performance or semantic meaning) between using ROW_NUMBER and using IDENTITY with an ORDER BY statement in MS SQL Server? For instance, given a table with a column "FirstName" is there any difference between

SELECT FirstName, ROW_NUMBER() OVER (ORDER BY FirstName) AS Position
INTO #MyTempTable
FROM MyTable

and

SELECT FirstName, IDENTITY(BIGINT) AS Position
INTO #MyTempTable
FROM MyTable
ORDER BY FirstName
like image 329
Jacob Horbulyk Avatar asked Jul 25 '16 18:07

Jacob Horbulyk


People also ask

Does ROW_NUMBER need order by?

The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. It is required.

Is ROW_NUMBER faster than group by?

Row_Number() is rarely faster, and usually only due to bad indexing/heaps, or because the data is incredibly limited that it has to number before moving forward.

Which is better ROW_NUMBER or RANK?

The row_number gives continuous numbers, while rank and dense_rank give the same rank for duplicates, but the next number in rank is as per continuous order so you will see a jump but in dense_rank doesn't have any gap in rankings.

What is the difference between ROW_NUMBER () and RANK ()?

RANK and DENSE_RANK are deterministic in this case, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER will arbitrarily (non deterministically) assign an incrementing result to the tied rows.


1 Answers

The semantic meaning is different. The first example creates an integer column with a sequential value.

The second example, using identity() creates an identity column. That means that subsequent inserts will increment.

For instance, run this code:

select 'a' as x, identity(int, 1, 1) as id
into #t;

insert into #t(x) values('b');

select *
from #t;

As for processing, the two should be essentially the same in your case, because the firstname needs to be sorted. If the rows were wider, I wouldn't be surprised if the row_number() version edged out the other in performance. With row_number() only one column is sorted and then mapped back to the original data. With identity() the entire row needs to be sorted. This difference in performance is just informed speculation.

like image 126
Gordon Linoff Avatar answered Sep 18 '22 10:09

Gordon Linoff