A GUID is a unique number that can be used as an identifier for anything in the universe, but unlike ISBN there is no central authority - the uniqueness of a GUID relies on the algorthm that was used to generate it.
In other words, if you keep generating GUIDs on a single machine, you will not get duplicates.
As with other version 4 GUIDs there's no absolute guarantee of uniqueness, but the odds are impressive. So long as you have fewer than 326,915,130,069,135,865 (i.e. sqrt(-22^122ln(0.99))) GUIDs in play simultaneously, you can be more than 99% sure there are no collisions.
The GUID generation algorithm relies on the fact that it has all 16 bytes to use to establish uniqueness, and if you throw away half of it, you lose the uniqueness. There are multiple GUID generation algorithms, but I'll pick one of them for concreteness, specifically the version described in this Internet draft.
Yes, you can. Since GUIDs are 128 bits long, there is admittedly a minute possibility of a clash—but the word "minute" is nowhere near strong enough. There are so many GUIDs that if you generate several trillion of them randomly, you're still more likely to get hit by a meteorite than to have even one collision (from Wikipedia). And if you aren't generating them randomly, but are e.g. using the MAC-address-and-time-stamp algorithm, then they're also going to be unique, as MAC addresses are unique among computers and time stamps are unique on your computer.
Edit 1: To answer your bonus question, the optimal way to test a set of GUIDs for uniqueness is to just assume that they are all are unique. Why? Because, given the number of GUIDs you're generating, the odds of a GUID collision are smaller than the odds of a cosmic ray flipping a bit in your computer's memory and screwing up the answer given by any "accurate" algorithm you'd care to run. (See this StackOverflow answer for the math.)
There are an enormous number of GUIDs out there. To quote Douglas Adams's Hitchhiker's Guide to the Galaxy:
"Space," it says, "is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. I mean you may think it's a long way down the road to the chemist, but that's just peanuts to space, listen…"
And since there are about 7×1022 stars in the universe, and just under 2128 GUIDs, then there are approximately 4.86×1015—almost five quadrillion—GUIDs for every single star. If every one of those stars had a world with a thriving population like ours, then around each and every star, every human or alien who had ever lived would be entitled to over forty-five thousand GUIDs. For every person in history at every star in the universe. The GUID space is at the same level of hugeness as the size of the entire universe. You do not need to worry.
(Edit 2: Reflecting on this: wow. I hadn't realized myself what this meant. The GUID space is incomprehensibly massive. I'm sort of in awe of it.)
Short answer: for practical purposes, yes.
However, you have to consider the birthday paradox!
I have calculated a few representative collision probabilities. With 122-bit UUIDs as specified in the Wikipedia article, the probability of collision is 1/2 if you generate at least 2.71492e18
UUIDs. With 10^19 UUIDs, the probability is 0.999918. With 10^17 UUIDs, 0.000939953.
Some numbers for comparison can be found on Wikipedia. So you can safely assign a UUID for each human that has lived, each galaxy in the observable universe, each fish in the ocean, and each individual ant on Earth. However, collisions are almost certain if you generate a UUID for each transistor humanity produces in a year, each insect on Earth, each grain of sand on Earth, each star in the observable universe, or anything larger.
If you generate 1 billion UUIDs per second, it would take about 36 years to get a collision probability of 10%.
Eventually, there will probably be a collision among the set of UUIDs generated over the course of human history. Still, the probability that collided UUIDs will be used for the same purpose is vanishingly small, so there's no problem in practice.
An analysis of the possibility of collision is available on Wikipedia: http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates
As mentioned in the link, this will be affected by the properties of the random number generator.
There is also the possibility of a bug in GUID generator code; while the chances are low, they are probably higher than the chances of a collision based on the mathematics.
A Bloom filter might be appropriate; it can quickly tell you if a GUID is unique, but there's a chance for a false indication of a collision. An alternate method if you're testing a batch at a time is to sort the batch and compare each successive element.
In general, yes it is safe to assume.
If your GUID generator is truly random, the possibilities of a clash within a 1000 GUIDs is extraordinarily small.
Of course, that assumes a good GUID generator. So the question is really about how much you trust the tool you're using to generate GUID and does it have its own tests?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With