I'm fairly well versed in SQL server performace but I constanly have to argue down the idea that GUIDs should be used as the default type for Clusterd Primary Keys.
Assuming that the table has a fairly low amount of inserts per day (5000 +/- rows / day), what kind of performace issues could we run into? How will page splits affect our seek performance? How often should I reindex (or should I defrag)? What should I set the fill factors to (100, 90, 80, ect)?
What if I were inserting 1,000,000 rows per day?
I apologize beforhand for all of the questions, but i'm looking to get some backup for not using GUIDs as our default for PKs. I am however completely open to having my mind changed by the overwehlming knowledge from the StackOverflow user base.
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
What is the issue if clustered index is on a GUID primary key column? The purpose of the primary key is to uniquely identify every row in the table. So, there is no problem in having the GUID as a primary key.
int is smaller, faster, easy to remember, keeps a chronological sequence. And as for Guid, the only advantage I found is that it is unique. In which case using sql server guid would be better than and int and why? From what I've seen, int has no flaws except by the number limit, which in many cases are irrelevant.
If you are doing any kind of volume, GUIDs are extremely bad as a PK bad unless you use sequential GUIDs, for the exact reasons you describe. Page fragmentation is severe:
Average Average
Fragmentation Fragment Fragment Page Average
Type in Percent Count Size Count Space Used
id 4.35 7 16.43 115 99.89
newidguid 98.77 162 1 162 70.90
newsequentualid 4.35 7 16.43 115 99.89
And as this comparison between GUIDs and integers shows:
Test1 caused a tremendous amount of page splits, and had a scan density around 12% when I ran a DBCC SHOWCONTIG after the inserts had completed. The Test2 table had a scan density around 98%
If your volume is very low, however, it just doesn't matter that much.
If you do really need a globally unique ID but have high volume (and can't use sequential IDs), just put the GUIDs in an indexed column.
Drawbacks of using GUID as primary key:
Advantages:
I thought the decision as to whether to use GUIDs was pretty simple, but maybe I'm unaware of other issues.
With such a low inserts per day, I doubt that page splitting should be a significant factor. The real question is how does 5,000 compares with the existing row count, as this would be the main information needed to decide on an appropriate initial fill factor to deffer splits.
This said, I'm personally not a big fan of GUIDs. I understand that they can serve well in some contexts but in many cases they are just "in the way" [of efficiency, of ease of use, of ...]
I find the following questions useful to narrow down on deciding whether GUID should be used or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With