I have tried searching this before asking but every result I have found mentions GUIDs as a PK which is not the case here.
I have a database that's using INT as the PK on all tables. However the data is accessed via API calls and a requirement was for the INT value not to be returned or used in any API. Therefore I thought of having an extra column on the tables containing a GUID.
Now my question is, if I Index the GUID column what kind of performance impact will this have? Would it be positive or negative? Bear in mind the GUID is NOT a PK or FK.
Having a guid column is perfectly ok like any varchar column as long as you do not use it as PK part and in general as a key column to join tables. Your database must have its own PK elements, filtering and joining data using them - filtering also by a GUID afterwards is perfectly ok.
In short, use GUIDs when: You want the data to be uniquely-identified, no matter where it came from. You need to be able to combine data from difference sources with little-to-no chance of duplicate GUIDs. You don't want or don't care about the users needing to remember an ID themselves.
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
An INT is certainly much easier to read when debugging, and much smaller. I would, however, use a GUID or similar as a license key for a product. You know it's going to be unique, and you know that it's not going to be sequential.
I think you are on the right track, but don't take it from me...
In the comments section on one of Kimberly Tripp's articles, she responds to a comment that advocates the opposite of your position, and she disagrees and argues for the same solution you are proposing (nonclustered indexed guid with a clustered int/bigint primary key).
Herman:
If the GUID is the intrinsic identifier for the entity being modelled (i.e. used by selects) then it should be the clustered primary key without question. The reason is that adding a surrogate identity key (of int or bigint) and demoting the GUID primary key to a column with an index/unique constraint requires 2 indexes to be maintained and slows down, in my experience, by a factor of 2.
Kimberly Tripp
Hey there Herman – Actually, I disagree. For point-based queries using a nonclustered index does not add a significant amount of costly IOs. And, the maintenance of a nonclustered index that’s heavily fragmented is a lot cheaper than the required maintenance on a heavily fragmented clustered index. Additionally, the GUID might make your nonclustered indexes unnecessarily wide – making them take: more log space, more disk space, more cache as well as adding time on insert and access (especially in larger queries/joins).
So, while you might not feel like an arbitrary/surrogate key is useful (because you never directly query against it) it can be incredibly efficient to use indirectly through your nonclustered indexes. There’s definitely an element of “it depends” here but if you have even just a few nonclustered indexes then it’s likely to be more beneficial than negative and often significantly so.Cheers,
kt ~ GUIDs as PRIMARY KEYs and/or the clustering key - Kimberly L. Tripp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With