Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When are you truly forced to use UUID as part of the design?

People also ask

When should UUID be used?

The point of a UUID is to have a universally unique identifier. There's generally two reason to use UUIDs: You do not want a database (or some other authority) to centrally control the identity of records. There's a chance that multiple components may independently generate a non-unique identifier.

What is a UUID and when might you use it?

UUIDs are generally used for identifying information that needs to be unique within a system or network thereof. Their uniqueness and low probability in being repeated makes them useful for being associative keys in databases and identifiers for physical hardware within an organization.

Should I use UUID in database?

If your database is or will eventually be distributed, like in the case of a local-first application, or simply if your NoSQL database is scaling up and divided upon multiple servers, I'd say that you have almost non choice : Use UUID! Just know that there is some things that you can do to improve performance.

Is it safe to use UUID?

UUIDs are safe enough for nearly all practical purposes1, and certainly for yours.


I wrote the UUID generator/parser for Ruby, so I consider myself to be reasonably well-informed on the subject. There are four major UUID versions:

Version 4 UUIDs are essentially just 16 bytes of randomness pulled from a cryptographically secure random number generator, with some bit-twiddling to identify the UUID version and variant. These are extremely unlikely to collide, but it could happen if a PRNG is used or if you just happen to have really, really, really, really, really bad luck.

Version 5 and Version 3 UUIDs use the SHA1 and MD5 hash functions respectively, to combine a namespace with a piece of already unique data to generate a UUID. This will, for example, allow you to produce a UUID from a URL. Collisions here are only possible if the underlying hash function also has a collision.

Version 1 UUIDs are the most common. They use the network card's MAC address (which unless spoofed, should be unique), plus a timestamp, plus the usual bit-twiddling to generate the UUID. In the case of a machine that doesn't have a MAC address, the 6 node bytes are generated with a cryptographically secure random number generator. If two UUIDs are generated in sequence fast enough that the timestamp matches the previous UUID, the timestamp is incremented by 1. Collisions should not occur unless one of the following happens: The MAC address is spoofed; One machine running two different UUID generating applications produces UUIDs at the exact same moment; Two machines without a network card or without user level access to the MAC address are given the same random node sequence, and generate UUIDs at the exact same moment; We run out of bytes to represent the timestamp and rollover back to zero.

Realistically, none of these events occur by accident within a single application's ID space. Unless you're accepting IDs on, say, an Internet-wide scale, or with an untrusted environment where malicious individuals might be able to do something bad in the case of an ID collision, it's just not something you should worry about. It's critical to understand that if you happen to generate the same version 4 UUID as I do, in most cases, it doesn't matter. I've generated the ID in a completely different ID space from yours. My application will never know about the collision so the collision doesn't matter. Frankly, in a single application space without malicious actors, the extinction of all life on earth will occur long before you have a collision, even on a version 4 UUID, even if you're generating quite a few UUIDs per second.

Also, 2^64 * 16 is 256 exabytes. As in, you would need to store 256 exabytes worth of IDs before you had a 50% chance of an ID collision in a single application space.


The thing that UUIDs buy you that is very difficult to do otherwise is to get a unique identifier without having to consult or coordinate with a central authority. The general problem of being able to get such a thing without some sort of managed infrastructure is the problem the UUIDs solve.

I've read that according to the birthday paradox the chance of a UUID collision occuring is 50% once 2^64 UUIDs have been generated. Now 2^64 is a pretty big number, but a 50% chance of collision seems far too risky (for example, how many UUIDs need to exist before there's a 5% chance of collision - even that seems like too large of a probability).

The problem with that analysis is twofold:

  1. UUIDs are not entirely random - there are major components of the UUID that are time and/or location-based. So to have any real chance at a collision, the colliding UUIDs need tobe generated at the exact same time from different UUID generators. I'd say that while there is a reasonable chance that several UUID's might be generated at the same time, there's enough other gunk (including location info or random bits) to make the likeyhood of a collision between this very small set of UUIDs nearly impossible.

  2. strictly speaking, UUIDs only need to be unique among the set of other UUIDs that they might be compared against. If you're generating a UUID to use as a database key, it doesn't matter if somewhere else in an evil alternate universe that the same UUID is being used to identify a COM interface. Just like it'll cause no confusion if there's someone (or something) else named "Michael Burr" on Alpha-Centauri.


Everything has a non-zero chance of failure. I would concentrate on far more likely to occur problems (i.e. almost anything you can think of) than the collision of UUIDs


An emphasis on "reasonably" or, as you put it, "effectively": good enough is how the real world works. The amount of computational work involved in covering that gap between "practically unique" and "truly unique" is enormous. Uniqueness is a curve with diminishing returns. At some point on that curve, there is a line between where "unique enough" is still affordable, and then we curve VERY steeply. The cost of adding more uniqueness becomes quite large. Infinite uniqueness has infinite cost.

UUID/GUID is, relatively speaking, a computationally quick and easy way to generate an ID which can be reasonably assumed to be universally unique. This is very important in many systems which need to integrate data from previously unconnected systems. For example: if you have a Content Management System which runs on two different platforms, but at some point need to import the content from one system into the other. You don't want IDs to change, so your references between data from system A remain intact, but you don't want any collisions with data created in system B. A UUID solves this.


It is never absolutely necessary to create a UUID. It is, however convenient to have a standard where offline users can each generate a key to something with a very low probability of collision.

This can aid in database replication resolution etc...

It would be easy for online users to generate unique keys for something without the overhead or possibility of collision, but that is not what UUIDs are for.

Anyways, a word on the probability of collision, taken from Wikipedia:

To put these numbers into perspective, one's annual risk of being hit by a meteorite is estimated to be one chance in 17 billion, equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.