Problem with UniqueIdentifiers
We have an existing database which uses uniqueidentifiers extensively (unfortunately!) both as primary keys and some nullable columns of some tables. We came across a situation where some reports that run on these tables sort on these uniqueidentifiers because there is no other column in the table that would give a meaningful sort (isn't that ironic!). The intent was to sort so that it shows the items in the order they were inserted but they were not inserted using NewSequentialId()
- hence a waste of time.
Fact about the Sort Algorithm
Anyway, considering SQL Server sorts uniqueidentifiers based on byte groups starting from the ending 5th byte group (6 bytes) and moving towards the 1st byte group (4 bytes) reversing the order on the 3rd byte-group (2 bytes) from right-left to left-right,
My Question
I was curious to know if there is any real life situation that this kind of sort helps at all.
How does SQL Server store the uniqueidentifier internally which might provide insight on why it has this whacky sort algorithm?
Reference:
Alberto Ferrari's discovery of the SQL Server GUID sort
Example
Uniqueidentifiers are sorted as shown below when you use a Order By on a uniqueidentifier column having the below data.
Please note that the below data is sorted ascendingly and highest sort preference is from the 5th byte group towards the 1st byte group (backwards).
-- 1st byte group of 4 bytes sorted in the reverse (left-to-right) order below -- 01000000-0000-0000-0000-000000000000 10000000-0000-0000-0000-000000000000 00010000-0000-0000-0000-000000000000 00100000-0000-0000-0000-000000000000 00000100-0000-0000-0000-000000000000 00001000-0000-0000-0000-000000000000 00000001-0000-0000-0000-000000000000 00000010-0000-0000-0000-000000000000 -- 2nd byte group of 2 bytes sorted in the reverse (left-to-right) order below -- 00000000-0100-0000-0000-000000000000 00000000-1000-0000-0000-000000000000 00000000-0001-0000-0000-000000000000 00000000-0010-0000-0000-000000000000 -- 3rd byte group of 2 bytes sorted in the reverse (left-to-right) order below -- 00000000-0000-0100-0000-000000000000 00000000-0000-1000-0000-000000000000 00000000-0000-0001-0000-000000000000 00000000-0000-0010-0000-000000000000 -- 4th byte group of 2 bytes sorted in the straight (right-to-left) order below -- 00000000-0000-0000-0001-000000000000 00000000-0000-0000-0010-000000000000 00000000-0000-0000-0100-000000000000 00000000-0000-0000-1000-000000000000 -- 5th byte group of 6 bytes sorted in the straight (right-to-left) order below -- 00000000-0000-0000-0000-000000000001 00000000-0000-0000-0000-000000000010 00000000-0000-0000-0000-000000000100 00000000-0000-0000-0000-000000001000 00000000-0000-0000-0000-000000010000 00000000-0000-0000-0000-000000100000 00000000-0000-0000-0000-000001000000 00000000-0000-0000-0000-000010000000 00000000-0000-0000-0000-000100000000 00000000-0000-0000-0000-001000000000 00000000-0000-0000-0000-010000000000 00000000-0000-0000-0000-100000000000
Code:
Alberto's code extended to denote that sorting is on the bytes and not on the individual bits.
With Test_UIDs As (-- 0 1 2 3 4 5 6 7 8 9 A B C D E F Select ID = 1, UID = cast ('00000000-0000-0000-0000-100000000000' as uniqueidentifier) Union Select ID = 2, UID = cast ('00000000-0000-0000-0000-010000000000' as uniqueidentifier) Union Select ID = 3, UID = cast ('00000000-0000-0000-0000-001000000000' as uniqueidentifier) Union Select ID = 4, UID = cast ('00000000-0000-0000-0000-000100000000' as uniqueidentifier) Union Select ID = 5, UID = cast ('00000000-0000-0000-0000-000010000000' as uniqueidentifier) Union Select ID = 6, UID = cast ('00000000-0000-0000-0000-000001000000' as uniqueidentifier) Union Select ID = 7, UID = cast ('00000000-0000-0000-0000-000000100000' as uniqueidentifier) Union Select ID = 8, UID = cast ('00000000-0000-0000-0000-000000010000' as uniqueidentifier) Union Select ID = 9, UID = cast ('00000000-0000-0000-0000-000000001000' as uniqueidentifier) Union Select ID = 10, UID = cast ('00000000-0000-0000-0000-000000000100' as uniqueidentifier) Union Select ID = 11, UID = cast ('00000000-0000-0000-0000-000000000010' as uniqueidentifier) Union Select ID = 12, UID = cast ('00000000-0000-0000-0000-000000000001' as uniqueidentifier) Union Select ID = 13, UID = cast ('00000000-0000-0000-0001-000000000000' as uniqueidentifier) Union Select ID = 14, UID = cast ('00000000-0000-0000-0010-000000000000' as uniqueidentifier) Union Select ID = 15, UID = cast ('00000000-0000-0000-0100-000000000000' as uniqueidentifier) Union Select ID = 16, UID = cast ('00000000-0000-0000-1000-000000000000' as uniqueidentifier) Union Select ID = 17, UID = cast ('00000000-0000-0001-0000-000000000000' as uniqueidentifier) Union Select ID = 18, UID = cast ('00000000-0000-0010-0000-000000000000' as uniqueidentifier) Union Select ID = 19, UID = cast ('00000000-0000-0100-0000-000000000000' as uniqueidentifier) Union Select ID = 20, UID = cast ('00000000-0000-1000-0000-000000000000' as uniqueidentifier) Union Select ID = 21, UID = cast ('00000000-0001-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 22, UID = cast ('00000000-0010-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 23, UID = cast ('00000000-0100-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 24, UID = cast ('00000000-1000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 25, UID = cast ('00000001-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 26, UID = cast ('00000010-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 27, UID = cast ('00000100-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 28, UID = cast ('00001000-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 29, UID = cast ('00010000-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 30, UID = cast ('00100000-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 31, UID = cast ('01000000-0000-0000-0000-000000000000' as uniqueidentifier) Union Select ID = 32, UID = cast ('10000000-0000-0000-0000-000000000000' as uniqueidentifier) ) Select * From Test_UIDs Order By UID, ID
The globally unique identifier (GUID) data type in SQL Server is represented by the uniqueidentifier data type, which stores a 16-byte binary value. A GUID is a binary number, and its main use is as an identifier that must be unique in a network that has many computers at many sites.
GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
Yes, SQL has its own GUID, because it's SQL. Not only does it have its own GUID, it has its own GUID sorting algorithm.
There are two different sort logics used by SQL Server to handle sorting! First one is the quick sort and another one is Merge sort. It begins sort in memory using Quick Sort algorithm but note that the memory requirement for this is at least 200% of its input rows size.
The algorithm is documented by the SQL Server guys here: How are GUIDs compared in SQL Server 2005? I Quote here here (since it's an old article that may be gone forever in a few years)
In general, equality comparisons make a lot of sense with uniqueidentifier values. However, if you find yourself needing general ordering, then you might be looking at the wrong data type and should consider various integer types instead.
If, after careful thought, you decide to order on a uniqueidentifier column, you might be surprised by what you get back.
Given these two uniqueidentifier values:
@g1= '55666BEE-B3A0-4BF5-81A7-86FF976E763F' @g2 = '8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'
Many people think that @g1 is less than @g2, since '55666BEE' is certainly smaller than '8DD5BCA5'. However, this is not how SQL Server 2005 compares uniqueidentifier values.
The comparison is made by looking at byte "groups" right-to-left, and left-to-right within a byte "group". A byte group is what is delimited by the '-' character. More technically, we look at bytes {10 to 15} first, then {8-9}, then {6-7}, then {4-5}, and lastly {0 to 3}.
In this specific example, we would start by comparing '86FF976E763F' with '393AE6BBB849'. Immediately we see that @g2 is indeed greater than @g1.
Note that in .NET languages, Guid values have a different default sort order than in SQL Server. If you find the need to order an array or list of Guid using SQL Server comparison semantics, you can use an array or list of SqlGuid instead, which implements IComparable in a way which is consistent with SQL Server semantics.
Plus, the sort follows byte groups endianness (see here: Globally unique identifier). The groups 10-15 and 8-9 are stored as big endian (corresponding to the Data4 in the wikipedia article), so they are compared as big endian. Other groups are compared using little endian.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With