Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which method for generating GUID's is best for ensuring the GUID is really unique?

I have been using this for a long long time, but I couldn't find clear explanation on when it is not guaranteed unique. Here are my comments, please correct as you see fit.

Value is guaranteed to be unique(ly created) on the machine it is created. So GUIDs generated by .net app on a machine will never conflict with those that are created by SQL Server on the same machine.

In all other cases there is no guarantee. Theoretically it is always possible that if there are multiple creation sources(say applications on different computers) for the unique identifier field in a db on a different computer, there can be duplicates that'll be inserted.

EDIT: Apologies for not forming the question properly. I guess I wanted to know the probability of conflicts when generated on the same machine vs different machines, if c#'s Guid.NewGuid(), SQL Server's newid(),newsequentialid() functions or any other functions by different applications are used. The 'already answered' link has one of the replies where it says he actually encountered a case where that happened. Makes you wonder how frequently that can happen.

If I have a table with a uniqueidentifier field as the pk, should I have to worry about checking for uniqueness each time I do an insert by creating a new value either by different apps or SQL Server's functions?

like image 479
Brian Avatar asked Aug 27 '13 17:08

Brian


1 Answers

I guess I wanted to know the probability of conflicts when generated on the same machine vs different machines, if c#'s Guid.NewGuid(), SQL Server's newid(), newsequentialid() functions or any other functions by different applications

That's an impossibly broad question. "any other function in a different application" is not something we can reason about.

Instead let's ask an answerable question and then answer it.

What are the different mechanisms for generating GUIDs?

Version one GUIDs combine the MAC address of the current machine, the current time, and a few more bits of implementation-specific origin. They are therefore unique in time and space.

Version three and five GUIDs use a crypto-strength hash of a unique string. Their probability of collision is based on the probability of a hash collision.

Version four GUIDs use a pseudorandom number generator. Their probability of collision is based on the probability of the PRNG generating a collision.

What is the probability of a version one GUID collision on two different machines, given the assumption that machines have unique MAC addresses?

Zero.

What is the probability of a version one GUID collision when running two virtual machines on the same physical machine and generating GUIDs on each?

High; if the GUIDs are generated in the same time slice then they have a high probability of colliding.

If it hurts when you do that, don't do it.

The rest of the GUID algorithms do not depend on details of the machine.

What is the probability of a version three or five GUID colliding with another, given that the source strings are unique?

The probability is roughly the same as the probability of a version four GUID collision, so let's consider that.

What is the probability of a version four GUID collision?

The probability that a given v4 GUID will collide with any v4 GUID in a set of n unique v4 GUIDs is n divided by 2122.

The probability that given a set of n v4 GUIDs will contain at least one collision is harder to express but it is extremely small as long as n is significantly smaller than 261.

like image 51
Eric Lippert Avatar answered Oct 10 '22 05:10

Eric Lippert