We have a database in which all the PKs are GUIDs, and most of the PKs are also the clustered index for the table. We know that this is bad (due to the random nature of GUIDs). So, it seems there are basically two options here (short of throwing out GUIDs as PKs altogether, which we cannot do (at least not at this time)). <ul> <li>We could change the GUID generation algorithm to e.g. the one that NHibernate uses, as detailed in this post, or</li> <li>we could, for the tables that are under the heaviest use, change to a different clustered index, e.g. an IDENTITY column, and keep the "random" GUIDs as PKs.</li> </ul> Is it possible to give any general recommendations in such a scenario? The application in question has 500+ tables, the largest one presently at about 1,5 million rows, a few tables around 500 000 rows, and the rest significantly lower (most of them well below 10K). Furthermore, the application is installed at several customer sites already, so we have to take any possible negative effects for existing customer into consideration. Thanks!

My opinion is clear: use an INT IDENTITY for your clustering key. That's by far the best, most optimal clustering key, because its: <ul> <li>small</li> <li>stable (should never change)</li> <li>unique</li> <li>ever increasing</li> </ul> Sequential GUID's are definitely a lot better than regular random GUIDs, but there's still four times larger than an INT (16 vs 4 byte) and this will be a factor if you have lots of rows in your table, and lots of non-clustered indices on that table, too. The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server. You can definitely keep the GUID as the primary key, where appropriate - but in that case, I'd strongly recommend adding a separate INT IDENTITY to that table and make that INT the clustering key. I've done that myself with a number of large tables, and the results are astonishing - the table fragmentation is down from 99 and more percent down to a few percent, and performance is much better. Check out Kimberly Tripp's excellent series on why GUID's are bad as clustering keys in SQL Server here: <ul> <li>GUIDs as PRIMARY KEYs and/or the clustering key</li> <li>The Clustered Index Debate Continues...</li> <li>Ever-increasing clustering key - the Clustered Index Debate..........again!</li> </ul> Marc

If you are able to change your guid generation to a sequential guid generation easily then that is probably your quick win option. The sequential guid will stop the fragmentation on the table whilst remaining as your clustered index. The major downside with a sequential guid though is that they then become guessable which is often not desired and the reason guids are used in the first place. If you go down the Identity route for your clustered primary key and then just an index on your guid column then you will still get a lot of fragmentation on your guid index. However the fact that the table will no longer get fragmented will be a massive gain. Finally though, I know you said you can't do this for now, but, if you don't NEED to use guids as an index at all then you remove all of these problems.

SQL Server database with clustered GUID PKs - switch clustered index or switch to sequential (comb) GUIDs?

Tags:

guid

sql-server

uniqueidentifier

clustered-index

We have a database in which all the PKs are GUIDs, and most of the PKs are also the clustered index for the table. We know that this is bad (due to the random nature of GUIDs). So, it seems there are basically two options here (short of throwing out GUIDs as PKs altogether, which we cannot do (at least not at this time)).

We could change the GUID generation algorithm to e.g. the one that NHibernate uses, as detailed in this post, or
we could, for the tables that are under the heaviest use, change to a different clustered index, e.g. an IDENTITY column, and keep the "random" GUIDs as PKs.

Is it possible to give any general recommendations in such a scenario?

The application in question has 500+ tables, the largest one presently at about 1,5 million rows, a few tables around 500 000 rows, and the rest significantly lower (most of them well below 10K).

Furthermore, the application is installed at several customer sites already, so we have to take any possible negative effects for existing customer into consideration.

Thanks!

815

asked Apr 09 '10 08:04

Eyvind

2 Answers

My opinion is clear: use an INT IDENTITY for your clustering key. That's by far the best, most optimal clustering key, because its:

small
stable (should never change)
unique
ever increasing

Sequential GUID's are definitely a lot better than regular random GUIDs, but there's still four times larger than an INT (16 vs 4 byte) and this will be a factor if you have lots of rows in your table, and lots of non-clustered indices on that table, too. The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server.

You can definitely keep the GUID as the primary key, where appropriate - but in that case, I'd strongly recommend adding a separate INT IDENTITY to that table and make that INT the clustering key. I've done that myself with a number of large tables, and the results are astonishing - the table fragmentation is down from 99 and more percent down to a few percent, and performance is much better.

Check out Kimberly Tripp's excellent series on why GUID's are bad as clustering keys in SQL Server here:

GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues...
Ever-increasing clustering key - the Clustered Index Debate..........again!

Marc

answered Sep 28 '22 14:09

marc_s

If you are able to change your guid generation to a sequential guid generation easily then that is probably your quick win option. The sequential guid will stop the fragmentation on the table whilst remaining as your clustered index. The major downside with a sequential guid though is that they then become guessable which is often not desired and the reason guids are used in the first place.

If you go down the Identity route for your clustered primary key and then just an index on your guid column then you will still get a lot of fragmentation on your guid index. However the fact that the table will no longer get fragmented will be a massive gain.

Finally though, I know you said you can't do this for now, but, if you don't NEED to use guids as an index at all then you remove all of these problems.

answered Sep 28 '22 15:09

Robin Day

Related questions
                            
                                Clever tricks to find specific LINQ queries in SQL Profiler
                            
                                What is NV32ts and its SQL Injection Attack trying to do?
                            
                                Loop through columns SQL
                            
                                Send email for each row in a result set
                            
                                To Find the registry of SQL Server 2005 Management Studio
                            
                                Adding "null" or NOT NULL column to a huge SQL Server table
                            
                                InsertAllOnSubmit only inserts first data record
                            
                                JDBC connection hanging
                            
                                Return temp table of continuous dates
                            
                                Determining the primary and foreign table in a relationship?
                            
                                How to find the item index number in a query result
                            
                                Database Schema Diagram From SQL Server
                            
                                Change the ANSI_NULLS setting for all Stored Procedures in the Database
                            
                                SQL Server table partitioning based on a modulus function?
                            
                                ORACLE Connect by clause equivalent in SQL Server
                            
                                How does one move SQL Server error log files to a new location?
                            
                                Search query, 'order by' priority
                            
                                Comma Delimited SQL string Need to separated
                            
                                Building a connection string from a user's input in a WPF application
                            
                                SQL Server: how to optimize "like" queries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With