Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Primary key in an Azure SQL database

I'm working on a distributed system that uses CQRS and DDD principles. Based on that I decided that the primary keys of my entities should be guids, which are generated by my domain (and not by the database).

I have been reading about guids as primary keys. However, it seems that some of the best practices are not valid anymore if applied to Azure SQL database.

  1. Sequential guids are nice if you use an on premise SQL server machine - the sequential guids that are generated will always be unique. However, on Azure, this is not the case anymore. As discussed in this thread, it's not even supported anymore; and generating them is also a bad idea as it becomes a single point of failure and it will not be guaranteed unique anymore across servers. I guess sequential guids don't make sense on Azure, so I should stick to regular guids. Is this correct?

  2. Columns of type Guid are bad candidates for clustering. But this article states that this is not the case on Azure, and this one suggests the opposite! Which one should I believe? Should I just make my primary key a guid and leave it clustered (as it is the default for primary keys); or should I not make it clustered and choose another column for clustering?

Thanks for any insight!

like image 590
L-Four Avatar asked Nov 01 '22 16:11

L-Four


2 Answers

the sequential guids that are generated will always be unique. However, on Azure, this is not the case anymore.

Have a look at the bottom of this post here - http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx

The issue with Guid's (which rely on NEWID()) is that they will be randomly distributed which has performance issues when it comes to applying a clustered index to them.

What I'd suggest is that you use a GUID for your Primary Key. Then remove the default clustered index from that column. Apply the Clustered Index to some other field on your table (i.e. the created date) so that the records will be sequentially/contiguously indexed as they are created. And then apply a non-clustered index to your PK Guid Column.

Chances are, that will be fine from a *SELECT * FROM TABLE WHERE Id = " point of view for returning single instances.

Similarly, if you're returning lists or ranges of records for display in a list, if you specifiy the default order by CreatedDate, your clustered index will work for that

like image 162
Eoin Campbell Avatar answered Nov 08 '22 06:11

Eoin Campbell


Considering the following

  1. Sql Azure requires a clustered index to perform replication. Note, the index does not have to be unique. http://blogs.msdn.com/b/sqlazure/archive/2010/05/12/10011257.aspx

  2. The advantage of a clustered index is that range queries on the index are performed optimally with minimum seeks.

  3. The disadvantages of a clustered index is that, if data is added in out of sequence order, page split may occur and inserts may be relatively slower.

Referencing the above, I suggest the following

  1. If you have a real key range you need to query upon, for example date, sequential number etc
    1. create a (unique/non-unique) clustered index for that key.
    2. create an additional unique index with domain generated GUIDs.
  2. If no real key range exists, just create the clustered unique index with domain generated GUIDs. (The overheads of adding a fake unneeded clustered index would be more of a hindrance than a help.)
like image 35
hocho Avatar answered Nov 08 '22 07:11

hocho