Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique identifier (guid) as primary key in database design

Tags:

Our data resides in a SQL Server 2008 database, there will be a lot queries and joinings between tables. We have this argument inside the team, some are arguing use of integer identity is better for performance, some are arguing use of guid (unique identifier).

Does the performance really suffer that badly using a GUID as a primary key?

like image 673
TOMMY WANG Avatar asked Mar 15 '12 19:03

TOMMY WANG


People also ask

Is a unique identifier as primary key?

A primary key, also called a primary keyword, is a column in a relational database table that's distinctive for each record. It's a unique identifier, such as a driver's license number, telephone number with area code or vehicle identification number (VIN). A relational database must have only one primary key.

Should I use int or GUID as primary key?

int is smaller, faster, easy to remember, keeps a chronological sequence. And as for Guid, the only advantage I found is that it is unique. In which case using sql server guid would be better than and int and why? From what I've seen, int has no flaws except by the number limit, which in many cases are irrelevant.

What is a GUID in database?

The globally unique identifier (GUID) data type in SQL Server is represented by the uniqueidentifier data type, which stores a 16-byte binary value. A GUID is a binary number, and its main use is as an identifier that must be unique in a network that has many computers at many sites.

When should I use GUID?

A GUID is a "Globally Unique IDentifier". You use it anywhere that you need an identifier that guaranteed to be different than every other. GUIDs are generally used when you will be defining an ID that must be different from an ID that someone else (outside of your control) will be defining.


1 Answers

A 128-bit GUID (uniqueidentifier) key is of course 4x larger than a 32-bit int key. However, there are a few key advantages:

  • No "IDENTITY INSERT" issue when merging content
  • If you use a COMB value instead of NEWSEQUENTIALID(), you get a "free" INSERT timestamp. You can even SELECT from the primary key based on a date/time range if you want with a few fancy CAST() calls.
  • They are globally unique, which turns out to be pretty handy now and then.
  • Since there's no need to track high-water marks, your BL layer can assign the value rather than SQL Server, thus eliminating the step of SELECT scope_identity() to get the primary key after an insert.
  • If it's even remotely possible that you could have more than 2 billion records, you'll need to use bigint (64 bits) instead of int. Once you do that, uniqueidentifier is only twice as big as a bigint.
  • Using GUIDs makes it safer to expose keys in URLs, etc. without exposing yourself to "guess-the-ID" attacks.
  • Between how SQL Server loads pages from disk and how processors are now mostly 64-bit, just because a number is 128 bits instead of 32 doesn't mean it takes 4x longer to compare. The last test I saw showed that GUIDs are nearly as fast.
  • Index size depends on how many columns are included. Even though the GUIDs themselves are larger, the extra 8 or 12 bytes may be insignificant compared to the other columns in the index.

In the end, squeezing out some small performance advantage by using integers may not be worth losing the advantages of a GUID. Test it empirically and decide for yourself.

Personally, I still use both, depending on the situation, but the deciding factor has never really come down to performance in my case.

like image 51
richardtallent Avatar answered Oct 17 '22 03:10

richardtallent