A GUID is a unique number that can be used as an identifier for anything in the universe, but unlike ISBN there is no central authority - the uniqueness of a GUID relies on the algorthm that was used to generate it.
In other words, if you keep generating GUIDs on a single machine, you will not get duplicates.
How unique is a GUID? 128-bits is big enough and the generation algorithm is unique enough that if 1,000,000,000 GUIDs per second were generated for 1 year the probability of a duplicate would be only 50%. Or if every human on Earth generated 600,000,000 GUIDs there would only be a 50% probability of a duplicate.
The GUID generation algorithm relies on the fact that it has all 16 bytes to use to establish uniqueness, and if you throw away half of it, you lose the uniqueness. There are multiple GUID generation algorithms, but I'll pick one of them for concreteness, specifically the version described in this Internet draft.
Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me $0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.
using System;
using System.Collections.Generic;
using System.Linq;
namespace GuidCollisionDetector
{
class Program
{
static void Main(string[] args)
{
//var reserveSomeRam = new byte[1024 * 1024 * 100]; // This indeed has no effect.
Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
// Fill up memory with guids.
var bigHeapOGuids = new HashSet<Guid>();
try
{
do
{
bigHeapOGuids.Add(Guid.NewGuid());
} while (true);
}
catch (OutOfMemoryException)
{
// Release the ram we allocated up front.
// Actually, these are pointless too.
//GC.KeepAlive(reserveSomeRam);
//GC.Collect();
}
Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());
// Spool up some threads to keep checking if there's a match.
// Keep running until the heat death of the universe.
for (long k = 0; k < Int64.MaxValue; k++)
{
for (long j = 0; j < Int64.MaxValue; j++)
{
Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
{
if (bigHeapOGuids.Contains(Guid.NewGuid()))
throw new ApplicationException("Guids collided! Oh my gosh!");
}
);
Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
}
}
Console.WriteLine("Umm... why hasn't the universe ended yet?");
}
}
}
PS: I wanted to try out the Parallel extensions library. That was easy.
And using OutOfMemoryException as control flow just feels wrong.
EDIT
Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.
And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.
EDIT 2 As always, the GC does a better job than I do at managing memory; any previous attempts at doing it myself were doomed to failure.
This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.
Assuming Moores law holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).
A GUID is theoretically non-unique. Here's your proof:
However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.
GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.
Of course GUIDs can collide. Since GUIDs are 128-bits, just generate 2^128 + 1
of them and by the pigeonhole principle there must be a collision.
But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).
If you generate a sequence of n
GUIDs randomly, then the probability of at least one collision is approximately p(n) = 1 - exp(-n^2 / 2 * 2^128)
(this is the birthday problem with the number of possible birthdays being 2^128
).
n p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03
To make these numbers concrete, 2^60 = 1.15e+18
. So, if you generate one billion GUIDs per second, it will take you 36 years to generate 2^60
random GUIDs and even then the probability that you have a collision is still 1.95e-03
. You're more likely to be murdered at some point in your life (4.76e-03
) than you are to find a collision over the next 36 years. Good luck.
If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.
Personally, I think the "Big Bang" was caused when two GUIDs collided.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With