I'm in the middle of doing a bit of research, and I've come across an anomaly that I can't explain (and I've not been able to find anything on Google). Consider the following SQL:
CREATE TABLE MyGuid (
ID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID()
)
GO
CREATE TABLE MyGuidSeq (
ID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID()
)
GO
DECLARE @i INT, @noRecords INT
SET @noRecords = 1000000
-- MyGuid
SET @i = 1
WHILE (@i <= @noRecords)
BEGIN
INSERT INTO MyGuid DEFAULT VALUES
SET @i = @i + 1
END
-- MyGuidSeq
SET @i = 1
WHILE (@i <= @noRecords)
BEGIN
INSERT INTO MyGuidSeq DEFAULT VALUES
SET @i = @i + 1
END
exec sp_spaceused 'MyGuid', true
exec sp_spaceused 'MyGuidSeq', true
Results:
Table name No. Rows Reserved Space Actual space Index Size Unused Space
MyGuid 1,000,000 34,760 KB 34,552 KB 160 KB 48 KB
MyGuidSeq 1,000,000 24,968 KB 24,768 KB 176 KB 24 KB
Question
Can anyone explain why the reserved / actual space is considerably smaller when using NEWSEQUENTIALID() over NEWID()?
In Response To Answer
I ran the following test to check on the answer I was given by Luaan below:
CREATE TABLE MyGuid (
ID UNIQUEIDENTIFIER DEFAULT NEWID()
)
GO
CREATE TABLE MyGuidSeq (
ID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID()
)
GO
DECLARE @i INT, @noRecords INT
SET @noRecords = 1000000
-- MyGuid
SET @i = 1
WHILE (@i <= @noRecords)
BEGIN
INSERT INTO MyGuid DEFAULT VALUES
SET @i = @i + 1
END
-- MyGuidSeq
SET @i = 1
WHILE (@i <= @noRecords)
BEGIN
INSERT INTO MyGuidSeq DEFAULT VALUES
SET @i = @i + 1
END
exec sp_spaceused 'MyGuid', true
exec sp_spaceused 'MyGuidSeq', true
The difference being is that I've removed the primary key (meaning the table is now a heap table). This now results in exactly the same sizes for both tables. This proves that the table is physically organized by its clustered index.
name rows reserved data index_size unused
MyGuid 1000000 25992 KB 25976 KB 8 KB 8 KB
MyGuidSeq 1000000 25992 KB 25976 KB 8 KB 8 KB
In assigning the default value of NEWID() , each new and existing row has a unique value for the CustomerID column.
The NEWID() function in SQL Server creates a unique value of type uniqueidentifier. One use of the NEWID() function is in generating random rows from a table.
SQL Server NEWID function is a system function that is used to generate a random unique value of type uniqueidentifier. A return type of NEWID function is uniqueidentifier. Uniqueidentifier is a Microsoft SQL Server data type that is used to store Globally Unique Identifiers (GUIDs). It can store 16 bytes of data.
This is related to partitioning. Basically, newId()
will create GUIDs in random order, which means that you're inserting into the middle of the table all the time. Sequential IDs, on the other hand, will always append to the end of the table, which is much simpler.
If you want to know more, look at some materials on paging. A good start might be the official MSDN page on MS SQL paging - http://technet.microsoft.com/en-us/library/ms190969(v=sql.105).aspx
You also have to understand that rows are inherently organized by ID in the physical files that store the database data. A file with no spaces between IDs (such as when using identity columns and no deletion) can take less space to store the same amount of data.
I'd expect that a full shrink of the database will significantly reduce the amount of space lost to fragmentation in MyGuid table, while it will do very little to MyGuidSeq size.
If you can use sequential GUIDs, do so - they improve INSERT efficiency a lot, and by extension, indices can also be less fragmented and smaller overall.
You're not showing the "time taken" debug outputs, but I expect that those are significantly different as well (even though this can be very much offset by the memory available to the database - it doesn't need to change the data files immediately; if you want to know more about this, look up something about transaction logs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With