Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Guid the best identity datatype for Databases?

It is connected to BI and merging of data from different data sources and would make that process more smooth.

And is there an optimal migration strategy from a database without Guids to a version with Guids without information losses?

like image 886
bovium Avatar asked Dec 12 '08 08:12

bovium


2 Answers

Keep in mind that GUID's (or 'unique_identifier') for PK's is a bad choice, as many PK's have a clustered index (so all rows are stored on disk in the indexed order). As GUID's are random, it's not certain a new row will be appended at the end of the index, but could be inserted in the middle of the index. This causes disk trashing as the rows have to be moved.

IF you consider guid's, at least use sqlserver 2005 or up and NEWSEQUENTIALID() for the PK value, to get sequential guid's which are always bigger than the last one, so are always appended at the end of the index. If you're not using sqlserver (but for example postgresql or you're using oracle and use CHAR(32) or other type), consider COMB's (see: http://www.informit.com/articles/article.aspx?p=25862 )

like image 75
Frans Bouma Avatar answered Sep 18 '22 05:09

Frans Bouma


Edited after reading Frans Bouma's answer, since my answer has been accepted and therefore moved to the top. Thanks, Frans.

GUIDs do make a good unique value, however due to their complex nature they're not really human-readable, which can make support difficult. If you're going to use GUIDs you might want to consider doing some performance analysis on bulk data operations before you make your choice. Take into account that if your primary key is "clustered" then GUIDs are not appropriate.

This is because a clustered index causes the rows to be physically re-ordered in the table on inserts/updates. Since GUIDs are random, every insert would require actual rows in the table to be moved to make way for the new row.

Personally I like to have two "keys" on my data:

1) Primary key
Unique, numeric values with a clustered primary key. This is my system's internal ID for each row, and is used to uniquely identify a row and in foreign keys.

Identity can cause trouble if you're using database replication (SQL Server will add a "rowguid" column automatically for merge-replicated tables) because the identity seed is maintained per server instance, and you'd get duplicates.

2) External Key/External ID/Business ID
Often it is also preferable to have the additional concept of an "external ID". This is often a character field with a unique constraint (possibly including another column e.g. customer identifier).

This would be the value used by external interfaces and would be exposed to customers (who do not recognise your internal values). This "business ID" allows customers to refer to your data using values that mean something to them.

like image 44
Neil Barnwell Avatar answered Sep 20 '22 05:09

Neil Barnwell