Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How unique is UUID?

People also ask

Is UUID actually unique?

No, a UUID can't be guaranteed to be unique. A UUID is just a 128-bit random number. When my computer generates a UUID, there's no practical way it can prevent your computer or any other device in the universe from generating that same UUID at some time in the future.

What makes a UUID unique?

Universally Unique Identifiers, or UUIDS, are 128 bit numbers, composed of 16 octets and represented as 32 base-16 characters, that can be used to identify information across a computer system. This specification was originally created by Microsoft and standardized by both the IETF and ITU.

How many unique UUIDs are there?

As per Wikipedia, the number of UUIDs generated to have atleast 1 collision is 2.71 quintillion.

Is UUID always the same?

A UUID (Universal Unique Identifier) is a 128-bit value used to uniquely identify an object or entity on the internet. Depending on the specific mechanisms used, a UUID is either guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated until A.D. 3400.


Very safe:

the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.

Caveat:

However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.

Source: The Random UUID probability of duplicates section of the Wikipedia article on Universally unique identifiers (link leads to a revision from December 2016 before editing reworked the section).

Also see the current section on the same subject on the same Universally unique identifier article, Collisions.


If by "given enough time" you mean 100 years and you're creating them at a rate of a billion a second, then yes, you have a 50% chance of having a collision after 100 years.


There is more than one type of UUID, so "how safe" depends on which type (which the UUID specifications call "version") you are using.

  • Version 1 is the time based plus MAC address UUID. The 128-bits contains 48-bits for the network card's MAC address (which is uniquely assigned by the manufacturer) and a 60-bit clock with a resolution of 100 nanoseconds. That clock wraps in 3603 A.D. so these UUIDs are safe at least until then (unless you need more than 10 million new UUIDs per second or someone clones your network card). I say "at least" because the clock starts at 15 October 1582, so you have about 400 years after the clock wraps before there is even a small possibility of duplications.

  • Version 4 is the random number UUID. There's six fixed bits and the rest of the UUID is 122-bits of randomness. See Wikipedia or other analysis that describe how very unlikely a duplicate is.

  • Version 3 is uses MD5 and Version 5 uses SHA-1 to create those 122-bits, instead of a random or pseudo-random number generator. So in terms of safety it is like Version 4 being a statistical issue (as long as you make sure what the digest algorithm is processing is always unique).

  • Version 2 is similar to Version 1, but with a smaller clock so it is going to wrap around much sooner. But since Version 2 UUIDs are for DCE, you shouldn't be using these.

So for all practical problems they are safe. If you are uncomfortable with leaving it up to probabilities (e.g. your are the type of person worried about the earth getting destroyed by a large asteroid in your lifetime), just make sure you use a Version 1 UUID and it is guaranteed to be unique (in your lifetime, unless you plan to live past 3603 A.D.).

So why doesn't everyone simply use Version 1 UUIDs? That is because Version 1 UUIDs reveal the MAC address of the machine it was generated on and they can be predictable -- two things which might have security implications for the application using those UUIDs.


The answer to this may depend largely on the UUID version.

Many UUID generators use a version 4 random number. However, many of these use Pseudo a Random Number Generator to generate them.

If a poorly seeded PRNG with a small period is used to generate the UUID I would say it's not very safe at all. Some random number generators also have poor variance. i.e. favouring certain numbers more often than others. This isn't going to work well.

Therefore, it's only as safe as the algorithms used to generate it.

On the flip side, if you know the answer to these questions then I think a version 4 uuid should be very safe to use. In fact I'm using it to identify blocks on a network block file system and so far have not had a clash.

In my case, the PRNG I'm using is a mersenne twister and I'm being careful with the way it's seeded which is from multiple sources including /dev/urandom. Mersenne twister has a period of 2^19937 − 1. It's going to be a very very long time before I see a repeat uuid.

So pick a good library or generate it yourself and make sure you use a decent PRNG algorithm.


Quoting from Wikipedia:

Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else

It goes on to explain in pretty good detail on how safe it actually is. So to answer your question: Yes, it's safe enough.


I concur with the other answers. UUIDs are safe enough for nearly all practical purposes1, and certainly for yours.

But suppose (hypothetically) that they aren't.

Is there a better system or a pattern of some type to alleviate this issue?

Here are a couple of approaches:

  1. Use a bigger UUID. For instance, instead of a 128 random bits, use 256 or 512 or ... Each bit you add to a type-4 style UUID will reduce the probability of a collision by a half, assuming that you have a reliable source of entropy2.

  2. Build a centralized or distributed service that generates UUIDs and records each and every one it has ever issued. Each time it generates a new one, it checks that the UUID has never been issued before. Such a service would be technically straight-forward to implement (I think) if we assumed that the people running the service were absolutely trustworthy, incorruptible, etcetera. Unfortunately, they aren't ... especially when there is the possibility of governments' security organizations interfering. So, this approach is probably impractical, and may be3 impossible in the real world.


1 - If uniqueness of UUIDs determined whether nuclear missiles got launched at your country's capital city, a lot of your fellow citizens would not be convinced by "the probability is extremely low". Hence my "nearly all" qualification.
2 - And here's a philosophical question for you. Is anything ever truly random? How would we know if it wasn't? Is the universe as we know it a simulation? Is there a God who might conceivably "tweak" the laws of physics to alter an outcome?
3 - If anyone knows of any research papers on this problem, please comment.