Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server GUID sort algorithm. Why?

Problem with UniqueIdentifiers

We have an existing database which uses uniqueidentifiers extensively (unfortunately!) both as primary keys and some nullable columns of some tables. We came across a situation where some reports that run on these tables sort on these uniqueidentifiers because there is no other column in the table that would give a meaningful sort (isn't that ironic!). The intent was to sort so that it shows the items in the order they were inserted but they were not inserted using NewSequentialId() - hence a waste of time.

Fact about the Sort Algorithm

Anyway, considering SQL Server sorts uniqueidentifiers based on byte groups starting from the ending 5th byte group (6 bytes) and moving towards the 1st byte group (4 bytes) reversing the order on the 3rd byte-group (2 bytes) from right-left to left-right,

My Question

I was curious to know if there is any real life situation that this kind of sort helps at all.

How does SQL Server store the uniqueidentifier internally which might provide insight on why it has this whacky sort algorithm?

Reference:

Alberto Ferrari's discovery of the SQL Server GUID sort

Example

Uniqueidentifiers are sorted as shown below when you use a Order By on a uniqueidentifier column having the below data.

Please note that the below data is sorted ascendingly and highest sort preference is from the 5th byte group towards the 1st byte group (backwards).

-- 1st byte group of 4 bytes sorted in the reverse (left-to-right) order below --   01000000-0000-0000-0000-000000000000 10000000-0000-0000-0000-000000000000 00010000-0000-0000-0000-000000000000 00100000-0000-0000-0000-000000000000 00000100-0000-0000-0000-000000000000 00001000-0000-0000-0000-000000000000 00000001-0000-0000-0000-000000000000 00000010-0000-0000-0000-000000000000  -- 2nd byte group of 2 bytes sorted in the reverse (left-to-right) order below --   00000000-0100-0000-0000-000000000000 00000000-1000-0000-0000-000000000000 00000000-0001-0000-0000-000000000000 00000000-0010-0000-0000-000000000000  -- 3rd byte group of 2 bytes sorted in the reverse (left-to-right) order below --   00000000-0000-0100-0000-000000000000 00000000-0000-1000-0000-000000000000 00000000-0000-0001-0000-000000000000 00000000-0000-0010-0000-000000000000  -- 4th byte group of 2 bytes sorted in the straight (right-to-left) order below --   00000000-0000-0000-0001-000000000000 00000000-0000-0000-0010-000000000000 00000000-0000-0000-0100-000000000000 00000000-0000-0000-1000-000000000000  -- 5th byte group of 6 bytes sorted in the straight (right-to-left) order below --   00000000-0000-0000-0000-000000000001 00000000-0000-0000-0000-000000000010 00000000-0000-0000-0000-000000000100 00000000-0000-0000-0000-000000001000 00000000-0000-0000-0000-000000010000 00000000-0000-0000-0000-000000100000 00000000-0000-0000-0000-000001000000 00000000-0000-0000-0000-000010000000 00000000-0000-0000-0000-000100000000 00000000-0000-0000-0000-001000000000 00000000-0000-0000-0000-010000000000 00000000-0000-0000-0000-100000000000 

Code:

Alberto's code extended to denote that sorting is on the bytes and not on the individual bits.

With Test_UIDs As (--                     0 1 2 3  4 5  6 7  8 9  A B C D E F             Select ID =  1, UID = cast ('00000000-0000-0000-0000-100000000000' as uniqueidentifier)     Union   Select ID =  2, UID = cast ('00000000-0000-0000-0000-010000000000' as uniqueidentifier)     Union   Select ID =  3, UID = cast ('00000000-0000-0000-0000-001000000000' as uniqueidentifier)     Union   Select ID =  4, UID = cast ('00000000-0000-0000-0000-000100000000' as uniqueidentifier)     Union   Select ID =  5, UID = cast ('00000000-0000-0000-0000-000010000000' as uniqueidentifier)     Union   Select ID =  6, UID = cast ('00000000-0000-0000-0000-000001000000' as uniqueidentifier)     Union   Select ID =  7, UID = cast ('00000000-0000-0000-0000-000000100000' as uniqueidentifier)     Union   Select ID =  8, UID = cast ('00000000-0000-0000-0000-000000010000' as uniqueidentifier)     Union   Select ID =  9, UID = cast ('00000000-0000-0000-0000-000000001000' as uniqueidentifier)     Union   Select ID = 10, UID = cast ('00000000-0000-0000-0000-000000000100' as uniqueidentifier)     Union   Select ID = 11, UID = cast ('00000000-0000-0000-0000-000000000010' as uniqueidentifier)     Union   Select ID = 12, UID = cast ('00000000-0000-0000-0000-000000000001' as uniqueidentifier)     Union   Select ID = 13, UID = cast ('00000000-0000-0000-0001-000000000000' as uniqueidentifier)     Union   Select ID = 14, UID = cast ('00000000-0000-0000-0010-000000000000' as uniqueidentifier)     Union   Select ID = 15, UID = cast ('00000000-0000-0000-0100-000000000000' as uniqueidentifier)     Union   Select ID = 16, UID = cast ('00000000-0000-0000-1000-000000000000' as uniqueidentifier)     Union   Select ID = 17, UID = cast ('00000000-0000-0001-0000-000000000000' as uniqueidentifier)     Union   Select ID = 18, UID = cast ('00000000-0000-0010-0000-000000000000' as uniqueidentifier)     Union   Select ID = 19, UID = cast ('00000000-0000-0100-0000-000000000000' as uniqueidentifier)     Union   Select ID = 20, UID = cast ('00000000-0000-1000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 21, UID = cast ('00000000-0001-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 22, UID = cast ('00000000-0010-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 23, UID = cast ('00000000-0100-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 24, UID = cast ('00000000-1000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 25, UID = cast ('00000001-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 26, UID = cast ('00000010-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 27, UID = cast ('00000100-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 28, UID = cast ('00001000-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 29, UID = cast ('00010000-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 30, UID = cast ('00100000-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 31, UID = cast ('01000000-0000-0000-0000-000000000000' as uniqueidentifier)     Union   Select ID = 32, UID = cast ('10000000-0000-0000-0000-000000000000' as uniqueidentifier) ) Select * From Test_UIDs Order By UID, ID 
like image 878
Kash Avatar asked Oct 18 '11 16:10

Kash


People also ask

What is the use of GUID in SQL?

The globally unique identifier (GUID) data type in SQL Server is represented by the uniqueidentifier data type, which stores a 16-byte binary value. A GUID is a binary number, and its main use is as an identifier that must be unique in a network that has many computers at many sites.

Is GUID good for primary key?

GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.

Can GUID be sorted?

Yes, SQL has its own GUID, because it's SQL. Not only does it have its own GUID, it has its own GUID sorting algorithm.

What sort algorithm does SQL Server use?

There are two different sort logics used by SQL Server to handle sorting! First one is the quick sort and another one is Merge sort. It begins sort in memory using Quick Sort algorithm but note that the memory requirement for this is at least 200% of its input rows size.


1 Answers

The algorithm is documented by the SQL Server guys here: How are GUIDs compared in SQL Server 2005? I Quote here here (since it's an old article that may be gone forever in a few years)

In general, equality comparisons make a lot of sense with uniqueidentifier values. However, if you find yourself needing general ordering, then you might be looking at the wrong data type and should consider various integer types instead.

If, after careful thought, you decide to order on a uniqueidentifier column, you might be surprised by what you get back.

Given these two uniqueidentifier values:

@g1= '55666BEE-B3A0-4BF5-81A7-86FF976E763F' @g2 = '8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'

Many people think that @g1 is less than @g2, since '55666BEE' is certainly smaller than '8DD5BCA5'. However, this is not how SQL Server 2005 compares uniqueidentifier values.

The comparison is made by looking at byte "groups" right-to-left, and left-to-right within a byte "group". A byte group is what is delimited by the '-' character. More technically, we look at bytes {10 to 15} first, then {8-9}, then {6-7}, then {4-5}, and lastly {0 to 3}.

In this specific example, we would start by comparing '86FF976E763F' with '393AE6BBB849'. Immediately we see that @g2 is indeed greater than @g1.

Note that in .NET languages, Guid values have a different default sort order than in SQL Server. If you find the need to order an array or list of Guid using SQL Server comparison semantics, you can use an array or list of SqlGuid instead, which implements IComparable in a way which is consistent with SQL Server semantics.

Plus, the sort follows byte groups endianness (see here: Globally unique identifier). The groups 10-15 and 8-9 are stored as big endian (corresponding to the Data4 in the wikipedia article), so they are compared as big endian. Other groups are compared using little endian.

like image 188
Simon Mourier Avatar answered Sep 19 '22 13:09

Simon Mourier