I would like optimize the performance of a database that my team is using for an application.
I have been looking for areas to add foreign keys, and in turn index those columns to improve the performance of joins. However, many of our tables are joined on an id that is a GUID
type, generated upon insertion of an item, and the data associated with that item in other tables is generally has column item_id
containing the GUID.
I have read that adding clustered indexes to GUID type columns is a very bad decision because the index will need to be constantly reconstructed in order to be effective. However, I was wondering, is there any detriment to utilizing a non-clustered index in the scenario described above? Or is it reasonable to assume that it would help performance? I can provide more information if needed.
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
Having a guid column is perfectly ok like any varchar column as long as you do not use it as PK part and in general as a key column to join tables. Your database must have its own PK elements, filtering and joining data using them - filtering also by a GUID afterwards is perfectly ok.
The globally unique identifier (GUID) data type in SQL Server is represented by the uniqueidentifier data type, which stores a 16-byte binary value. A GUID is a binary number, and its main use is as an identifier that must be unique in a network that has many computers at many sites.
A clustered index may be the fastest for one SELECT statement but it may not necessarily be correct choice. SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page.
An index on a <anytype>
is by far the best option you have to improve joins and singleton lookups. Lacking this index the query will always have to scan the entire table end-to-end with (often) abysmal performance results and concurrency gone down the drain.
It is true that uniqueidentifier
makes poor choice for indexes for the reasons you mention, but by no means does that implies that you should not create these indexes. Changing the data type to INT or BIGINT would be advisable, if possible. Using NEWSEQUENTIALID()
or UuidCreateSequential
to generate them would help with fragmentation issues. If all alternatives fail you may have to do index maintenance (Rebuild, reorganize) operations more often than for other indexes. But by no means do any of these drawbacks outweigh the benefit of having the index in the first place!
Two performance:
- insert
- select
An index should improve select
An index will slow slow down insert.
If the inserts are in order the index does not fragment.
If the inserts are not in order the index will fragment.
Index fragmentation slows down both insert and select.
Via maintenance can defragment the index.
Adding an non-clustered index to the column that references a FK will help the joins.
Since that column is most likely not ordered that fact it is a GUID is of no loss.
On the FK table itself is where GUID is not a good candidate for a PK (clustered index).
With GUID as PK that index fragments on insert.
Int or sequential ID are better candidates as they would not fragment the PK on insert.
But no big deal just defragment those tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With