I'm working on a distributed system that uses CQRS and DDD principles. Based on that I decided that the primary keys of my entities should be guids, which are generated by my domain (and not by the database).
I have been reading about guids as primary keys. However, it seems that some of the best practices are not valid anymore if applied to Azure SQL database.
Sequential guids are nice if you use an on premise SQL server machine - the sequential guids that are generated will always be unique. However, on Azure, this is not the case anymore. As discussed in this thread, it's not even supported anymore; and generating them is also a bad idea as it becomes a single point of failure and it will not be guaranteed unique anymore across servers. I guess sequential guids don't make sense on Azure, so I should stick to regular guids. Is this correct?
Columns of type Guid are bad candidates for clustering. But this article states that this is not the case on Azure, and this one suggests the opposite! Which one should I believe? Should I just make my primary key a guid and leave it clustered (as it is the default for primary keys); or should I not make it clustered and choose another column for clustering?
Thanks for any insight!
the sequential guids that are generated will always be unique. However, on Azure, this is not the case anymore.
Have a look at the bottom of this post here - http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx
The issue with Guid's (which rely on NEWID()
) is that they will be randomly distributed which has performance issues when it comes to applying a clustered index to them.
What I'd suggest is that you use a GUID for your Primary Key. Then remove the default clustered index from that column. Apply the Clustered Index to some other field on your table (i.e. the created date) so that the records will be sequentially/contiguously indexed as they are created. And then apply a non-clustered index to your PK Guid Column.
Chances are, that will be fine from a *SELECT * FROM TABLE WHERE Id = " point of view for returning single instances.
Similarly, if you're returning lists or ranges of records for display in a list, if you specifiy the default order by CreatedDate, your clustered index will work for that
Considering the following
Sql Azure requires a clustered index to perform replication. Note, the index does not have to be unique. http://blogs.msdn.com/b/sqlazure/archive/2010/05/12/10011257.aspx
The advantage of a clustered index is that range queries on the index are performed optimally with minimum seeks.
The disadvantages of a clustered index is that, if data is added in out of sequence order, page split may occur and inserts may be relatively slower.
Referencing the above, I suggest the following
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With