I've worked on a number of database systems in the past where moving entries between databases would have been made a lot easier if all the database keys had been GUID / UUID values. I've considered going down this path a few times, but there's always a bit of uncertainty, especially around performance and un-read-out-over-the-phone-able URLs.
Has anyone worked extensively with GUIDs in a database? What advantages would I get by going that way, and what are the likely pitfalls?
Advantages: UUID values are unique between tables and databases. Thats why it can be merge rows between two databases or distributed databases. UUID is more safer to pass through url than integer type data.
The GUID designation is an industry standard defined by Microsoft to provide a reference number which is unique in any context. UUID is a term that stands for Universal Unique Identifier. Similarly, GUID stands for Globally Unique Identifier. So basically, two terms for the same thing.
Pros. Using UUID for a primary key brings the following advantages: UUID values are unique across tables, databases, and even servers that allow you to merge rows from different databases or distribute databases across servers. UUID values do not expose the information about your data so they are safer to use in a URL.
Primary keys should never be exposed, even UUIDsA primary key is, by definition unique within its scope. It is, therefore, an obvious thing to use as a customer number, or in a URL to identify a unique page or row. Don't! I would argue that using a PK in any public context is a bad idea.
Advantages:
Disadvantages:
Personally, I use them for most PK's in any system of a decent size, but I got "trained" on a system which was replicated all over the place, so we HAD to have them. YMMV.
I think the duplicate data thing is rubbish - you can get duplicate data however you do it. Surrogate keys are usually frowned upon where ever I've been working. We DO use the WordPress-like system though:
UPDATE: So this one gets +1'ed a lot, and I thought I should point out a big downside of GUID PK's: Clustered Indexes.
If you have a lot of records, and a clustered index on a GUID, your insert performance will SUCK, as you get inserts in random places in the list of items (thats the point), not at the end (which is quick)
So if you need insert performance, maybe use a auto-inc INT, and generate a GUID if you want to share it with someone else (ie, show it to a user in a URL)
Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With