Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance impact of using GUID in SQL Server

I have tried searching this before asking but every result I have found mentions GUIDs as a PK which is not the case here.

I have a database that's using INT as the PK on all tables. However the data is accessed via API calls and a requirement was for the INT value not to be returned or used in any API. Therefore I thought of having an extra column on the tables containing a GUID.

Now my question is, if I Index the GUID column what kind of performance impact will this have? Would it be positive or negative? Bear in mind the GUID is NOT a PK or FK.

like image 877
niko619 Avatar asked Mar 23 '17 21:03

niko619


People also ask

Is it good to use GUID as primary key?

Having a guid column is perfectly ok like any varchar column as long as you do not use it as PK part and in general as a key column to join tables. Your database must have its own PK elements, filtering and joining data using them - filtering also by a GUID afterwards is perfectly ok.

When should I use GUID?

In short, use GUIDs when: You want the data to be uniquely-identified, no matter where it came from. You need to be able to combine data from difference sources with little-to-no chance of duplicate GUIDs. You don't want or don't care about the users needing to remember an ID themselves.

Why you should never use GUIDs as part of Clustered Index?

The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

Should I use int or GUID as primary key?

An INT is certainly much easier to read when debugging, and much smaller. I would, however, use a GUID or similar as a license key for a product. You know it's going to be unique, and you know that it's not going to be sequential.


1 Answers

I think you are on the right track, but don't take it from me...

In the comments section on one of Kimberly Tripp's articles, she responds to a comment that advocates the opposite of your position, and she disagrees and argues for the same solution you are proposing (nonclustered indexed guid with a clustered int/bigint primary key).

Herman:

If the GUID is the intrinsic identifier for the entity being modelled (i.e. used by selects) then it should be the clustered primary key without question. The reason is that adding a surrogate identity key (of int or bigint) and demoting the GUID primary key to a column with an index/unique constraint requires 2 indexes to be maintained and slows down, in my experience, by a factor of 2.


Kimberly Tripp

Hey there Herman – Actually, I disagree. For point-based queries using a nonclustered index does not add a significant amount of costly IOs. And, the maintenance of a nonclustered index that’s heavily fragmented is a lot cheaper than the required maintenance on a heavily fragmented clustered index. Additionally, the GUID might make your nonclustered indexes unnecessarily wide – making them take: more log space, more disk space, more cache as well as adding time on insert and access (especially in larger queries/joins).

So, while you might not feel like an arbitrary/surrogate key is useful (because you never directly query against it) it can be incredibly efficient to use indirectly through your nonclustered indexes. There’s definitely an element of “it depends” here but if you have even just a few nonclustered indexes then it’s likely to be more beneficial than negative and often significantly so.

Cheers,
kt ~ GUIDs as PRIMARY KEYs and/or the clustering key - Kimberly L. Tripp

like image 154
SqlZim Avatar answered Sep 29 '22 20:09

SqlZim