Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UUID collision risk using different algorithms

Tags:

I have a database where 2 (or maybe 3 or 4) different applications are inserting information. The new information has IDs of the type GUID/UUID, but each application is using a different algorithm to generate the IDs. For example, one is using the NHibernate's "guid.comb", other is using the SQLServer's NEWID(), other might want to use .NET's Guid.NewGuid() implementation.

Is there an above normal risk of ID collision or duplicates?

Thanks!

like image 462
Diego Jancic Avatar asked Jun 14 '10 14:06

Diego Jancic


People also ask

Can UUID ever collide?

A collision is possible but the total number of unique keys generated is so large that the possibility of a collision is almost zero. As per Wikipedia, the number of UUIDs generated to have atleast 1 collision is 2.71 quintillion. This is equivalent to generating around 1 billion UUIDs per second for about 85 years.

Should you check for UUID collisions?

You should certainly detect if a collision occurs, and your application should throw an exception if it does happen. E.g. if the UUID is used as primary key in the database, then the database should throw an error when inserting a colliding ID.

Can UUID run out?

A sample of 3.26*10¹⁶ UUIDs has a 99.99% chance of not having any duplicates. Generating that many UUIDs, at a rate of one per second, would take a billion years.


2 Answers

The risk of collisions is elevated slightly but still vanishingly small. Consider that:

  • Both Comb and NEWID/NEWSEQUENTIALID include a timestamp with precision down to a few ms. Thus, unless you are generating a large number of IDs at the exact same moment time from all of these different sources, it is literally impossible for IDs to collide.

  • The part of the GUID that isn't based on the timestamp can be thought of as random; most GUID algorithms base these digits on a PRNG. Thus, the likelihood of a collision between these other 10 bytes or so is on the same order as if you used two separate random number generators and watched for collisions.

    Think about this for a moment - PRNGs can and do repeat numbers, so the likelihood of a collision between two of them isn't significantly higher than a collision using just one of them, even if they use slightly different algorithms. It's sort of like playing the same lottery numbers every week vs. picking a random set every week - the odds of winning are exactly the same either way.

Now, keep in mind that when you use an algorithm like Guid.Comb, you only have 10 bits of uniqueifier, which equates to 1024 separate values. So if you're generating a huge number of GUIDs within the same few milliseconds, you will get collisions. But if you generate GUIDs at a fairly low frequency, it doesn't really matter how many different algorithms you use at the same time, the likelihood of a collision is still practically nonexistent.

The best way for you to be absolutely certain is to run a test; have all 2 or 3 (or however many you use) generating GUIDs, at the same time, at regular intervals, and write them out to a log file, and see if you get collisions (and if so, how many). That should give you a good idea of how safe this is in practice.

P.S. If you're using NHibernate's comb generator to generate GUIDs for a clustered primary key, consider using NEWSEQUENTIALID() instead of NEWID() - the whole point of Comb is to avoid page splits, and you're not accomplishing that if you have other processes using non-sequential algorithms. You should also change any code using Guid.NewGuid to use the same Comb generator - the actual Comb algorithm used in NHibernate is not complicated and easy to duplicate in your own domain logic.

† Note that there seems to be some dispute about NEWID, and whether or not it contains a timestamp. In any case, since it is based on the MAC address, the range of possible values is considerably smaller than a V4 GUID or a Comb. Further reason for me to recommend sticking to Comb GUIDs outside the database and NEWSEQUENTIALID inside the database.

like image 192
Aaronaught Avatar answered Oct 03 '22 13:10

Aaronaught


Yes, the risk is above normal, because all of these use different definitions of "GUID." Guid.NewGuid() is an RFC-compliant mostly-random GUID, but NEWSEQUENTIALID is a reordered (and therefore non-RFC-compliant) GUID based on MAC address and timestamp, and NHibernate's comb GUID is completely different (based on randomness and timestamp).

You may want to consider just standardizing on one GUID implementation. I use my own type of combed GUID for all my apps. My blog has brief descriptions of all these types of GUIDs along with design decisions for my own.

like image 24
Stephen Cleary Avatar answered Oct 03 '22 14:10

Stephen Cleary