Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using NEWID() use more space than NEWSEQUENTIALID()?

I'm in the middle of doing a bit of research, and I've come across an anomaly that I can't explain (and I've not been able to find anything on Google). Consider the following SQL:

CREATE TABLE MyGuid (
    ID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID()
)
GO
CREATE TABLE MyGuidSeq (
    ID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID()
)
GO

DECLARE @i INT, @noRecords INT
SET @noRecords = 1000000

-- MyGuid
SET @i = 1
WHILE (@i <= @noRecords)
    BEGIN
        INSERT INTO MyGuid DEFAULT VALUES
        SET @i = @i + 1
    END

-- MyGuidSeq
SET @i = 1
WHILE (@i <= @noRecords)
    BEGIN
        INSERT INTO MyGuidSeq DEFAULT VALUES
        SET @i = @i + 1
    END

exec sp_spaceused 'MyGuid', true
exec sp_spaceused 'MyGuidSeq', true

Results:

Table name  No. Rows   Reserved Space  Actual space  Index Size  Unused Space
MyGuid      1,000,000  34,760 KB       34,552 KB     160 KB      48 KB
MyGuidSeq   1,000,000  24,968 KB       24,768 KB     176 KB      24 KB

Question

Can anyone explain why the reserved / actual space is considerably smaller when using NEWSEQUENTIALID() over NEWID()?

In Response To Answer

I ran the following test to check on the answer I was given by Luaan below:

CREATE TABLE MyGuid (
    ID UNIQUEIDENTIFIER DEFAULT NEWID()
)
GO
CREATE TABLE MyGuidSeq (
    ID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID()
)
GO

DECLARE @i INT, @noRecords INT
SET @noRecords = 1000000
-- MyGuid
SET @i = 1
WHILE (@i <= @noRecords)
    BEGIN
        INSERT INTO MyGuid DEFAULT VALUES
        SET @i = @i + 1
    END

-- MyGuidSeq
SET @i = 1
WHILE (@i <= @noRecords)
    BEGIN
        INSERT INTO MyGuidSeq DEFAULT VALUES
        SET @i = @i + 1
    END

exec sp_spaceused 'MyGuid', true
exec sp_spaceused 'MyGuidSeq', true

The difference being is that I've removed the primary key (meaning the table is now a heap table). This now results in exactly the same sizes for both tables. This proves that the table is physically organized by its clustered index.

name       rows    reserved  data      index_size  unused
MyGuid     1000000 25992 KB  25976 KB  8 KB        8 KB
MyGuidSeq  1000000 25992 KB  25976 KB  8 KB        8 KB
like image 950
Spikeh Avatar asked Jan 07 '14 14:01

Spikeh


People also ask

Is SQL Newid unique?

In assigning the default value of NEWID() , each new and existing row has a unique value for the CustomerID column.

What is Newid?

The NEWID() function in SQL Server creates a unique value of type uniqueidentifier. One use of the NEWID() function is in generating random rows from a table.

What datatype is newid ()?

SQL Server NEWID function is a system function that is used to generate a random unique value of type uniqueidentifier. A return type of NEWID function is uniqueidentifier. Uniqueidentifier is a Microsoft SQL Server data type that is used to store Globally Unique Identifiers (GUIDs). It can store 16 bytes of data.


1 Answers

This is related to partitioning. Basically, newId() will create GUIDs in random order, which means that you're inserting into the middle of the table all the time. Sequential IDs, on the other hand, will always append to the end of the table, which is much simpler.

If you want to know more, look at some materials on paging. A good start might be the official MSDN page on MS SQL paging - http://technet.microsoft.com/en-us/library/ms190969(v=sql.105).aspx

You also have to understand that rows are inherently organized by ID in the physical files that store the database data. A file with no spaces between IDs (such as when using identity columns and no deletion) can take less space to store the same amount of data.

I'd expect that a full shrink of the database will significantly reduce the amount of space lost to fragmentation in MyGuid table, while it will do very little to MyGuidSeq size.

If you can use sequential GUIDs, do so - they improve INSERT efficiency a lot, and by extension, indices can also be less fragmented and smaller overall.

You're not showing the "time taken" debug outputs, but I expect that those are significantly different as well (even though this can be very much offset by the memory available to the database - it doesn't need to change the data files immediately; if you want to know more about this, look up something about transaction logs).

like image 115
Luaan Avatar answered Oct 14 '22 00:10

Luaan