Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Guid Primary Key Join Performance

I'm currently using GUIDs as a NONCLUSTERED PRIMARY KEY alongside an INT IDENTITY column.

The GUIDs are required to allow offline creation of data and synchronisation - which is how the entire database is populated.

I'm aware of the implications of using a GUID as a clustered primary key, hence the integer clustered index but does using a GUID as a primary key and therefore foreign keys on other tables have significant performance implications?

Would a better option to use an integer primary/foreign key, and use the GUID as a client ID which has a UNIQUE INDEX on each table? - My concern there is that entity framework would require loading the navigation properties in order to get the GUID of the related entity without significant alteration to the existing code.

The database/hardware in question is SQL Azure.

like image 753
Jamie Avatar asked Nov 29 '13 14:11

Jamie


People also ask

Is it good to use GUID as primary key?

Having a guid column is perfectly ok like any varchar column as long as you do not use it as PK part and in general as a key column to join tables. Your database must have its own PK elements, filtering and joining data using them - filtering also by a GUID afterwards is perfectly ok.

Which is the most efficient join in SQL?

TLDR: The most efficient join is also the simplest join, 'Relational Algebra'. If you wish to find out more on all the methods of joins, read further. Relational algebra is the most common way of writing a query and also the most natural way to do so.

Does primary key affect performance?

A PK doesn't need to be a single field. But also, a PK is a logical construct and won't help performance on single-table queries (though the index that often comes with them usually does, at least for queries that filter on those fields).


1 Answers

You can also create foreign keys against unique key constraints, which then gives you the option to foreign key to the ID identity as an alternative to the Guid.

i.e.

Create Table SomeTable
(
    UUID UNIQUEIDENTIFIER NOT NULL,
    ID INT IDENTITY(1,1) NOT NULL,

    CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UUID),
    CONSTRAINT UQ UNIQUE (ID)
)
GO

Create Table AnotherTable
(
    SomeTableID INT,

    FOREIGN KEY (SomeTableID) REFERENCES SomeTable(ID)
)
GO

Edit

Assuming that your centralized database is a Mart, and that only batch ETL is done from the source databases, if you do your ETL directly to the central database (i.e. not via Entity Framework), given that all your tables have UUID FK's after re-population from the distributed databases, you'll need to either map the INT UKCs during ETL or fix them up after the import (which would require a temporary NOCHECK constraint step on the INT FK's).

Once ETL is loaded and INT keys are mapped, I would suggest you ignore / remove the UUID's from your ORM model - you would need to regenerate your EF navigation on the INT keys.

A different solution would be required if you update the central database directly or do continual ETL and do use EF for the ETL itself. In this case, it might be less total I/O just to leave the PK GUID as FKs for RI, drop the INT FK's altogether, and choose other suitable columns for clustering (minimizing page reads).

like image 162
StuartLC Avatar answered Sep 30 '22 17:09

StuartLC