UUID collision risk using different algorithms

Tags:

I have a database where 2 (or maybe 3 or 4) different applications are inserting information. The new information has IDs of the type GUID/UUID, but each application is using a different algorithm to generate the IDs. For example, one is using the NHibernate's "guid.comb", other is using the SQLServer's NEWID(), other might want to use .NET's Guid.NewGuid() implementation.

Is there an above normal risk of ID collision or duplicates?

Thanks!

462

asked Jun 14 '10 14:06

Diego Jancic

2 Answers

The risk of collisions is elevated slightly but still vanishingly small. Consider that:

Both Comb and NEWID/NEWSEQUENTIALID include a timestamp with precision down to a few ms^†. Thus, unless you are generating a large number of IDs at the exact same moment time from all of these different sources, it is literally impossible for IDs to collide.
The part of the GUID that isn't based on the timestamp can be thought of as random; most GUID algorithms base these digits on a PRNG. Thus, the likelihood of a collision between these other 10 bytes or so is on the same order as if you used two separate random number generators and watched for collisions.

Think about this for a moment - PRNGs can and do repeat numbers, so the likelihood of a collision between two of them isn't significantly higher than a collision using just one of them, even if they use slightly different algorithms. It's sort of like playing the same lottery numbers every week vs. picking a random set every week - the odds of winning are exactly the same either way.

Now, keep in mind that when you use an algorithm like Guid.Comb, you only have 10 bits of uniqueifier, which equates to 1024 separate values. So if you're generating a huge number of GUIDs within the same few milliseconds, you will get collisions. But if you generate GUIDs at a fairly low frequency, it doesn't really matter how many different algorithms you use at the same time, the likelihood of a collision is still practically nonexistent.

The best way for you to be absolutely certain is to run a test; have all 2 or 3 (or however many you use) generating GUIDs, at the same time, at regular intervals, and write them out to a log file, and see if you get collisions (and if so, how many). That should give you a good idea of how safe this is in practice.

P.S. If you're using NHibernate's comb generator to generate GUIDs for a clustered primary key, consider using NEWSEQUENTIALID() instead of NEWID() - the whole point of Comb is to avoid page splits, and you're not accomplishing that if you have other processes using non-sequential algorithms. You should also change any code using Guid.NewGuid to use the same Comb generator - the actual Comb algorithm used in NHibernate is not complicated and easy to duplicate in your own domain logic.

_{† Note that there seems to be some dispute about NEWID, and whether or not it contains a timestamp. In any case, since it is based on the MAC address, the range of possible values is considerably smaller than a V4 GUID or a Comb. Further reason for me to recommend sticking to Comb GUIDs outside the database and NEWSEQUENTIALID inside the database.}

192

answered Oct 03 '22 13:10

Aaronaught

Yes, the risk is above normal, because all of these use different definitions of "GUID." Guid.NewGuid() is an RFC-compliant mostly-random GUID, but NEWSEQUENTIALID is a reordered (and therefore non-RFC-compliant) GUID based on MAC address and timestamp, and NHibernate's comb GUID is completely different (based on randomness and timestamp).

You may want to consider just standardizing on one GUID implementation. I use my own type of combed GUID for all my apps. My blog has brief descriptions of all these types of GUIDs along with design decisions for my own.

answered Oct 03 '22 14:10

Stephen Cleary

Related questions
                            
                                Parsing a string for dates in PHP
                            
                                What is the purpose of the IsSerializable interface in GWT (regarding the RPC mechanism)
                            
                                Using ASP.Net MVC Data Annotation outside of MVC
                            
                                Using Mercurial with Visual Studio 2010 [closed]
                            
                                How to launch PowerShell script from the OS command line?
                            
                                How can I compare files in a JUnit test case? [duplicate]
                            
                                CVS Ignore Directories
                            
                                UIActionSheet Change arrow position?
                            
                                Compact decompression library for embedded use
                            
                                Can you access a Django Model "property" from it's ModelForm?
                            
                                Cannot create a row of size 8937 which is greater than the allowable maximum of 8060
                            
                                Difference between JAVA_OPTS and JAVA_TOOL_OPTIONS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With