Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need a canonical format for the GUID?

One hard working day I noticed that GUIDs I've been generating with usual .NET's Guid.NewGuid() method had the same number 4 in the beginning of the third block:

efeafa5f-fe21-4ab4-ba82-b9eefd5fa225
480b64d0-6762-4afe-8496-ac7cf3292898
397579c2-a4f4-4611-9fda-16e9c1e52d6a
...

There were ten of them appearing on the screen once a second or so. I've kept my eye on this pattern right after the fifth GUID. Finally, the last one had the same four bits inside and I've decided that I'm a lucky guy. I went home and felt that the whole world is opened for such an exceptional person as me. Next week I found a new work, cleaned my room and made a call to my parents.

But today I've faced the same pattern again. Thousand times. And I don't feel the Chosen One anymore.

I've googled it and now I know about UUID and a canonical format with 4 reserved bits for version and 2 for variant.

Here's a snippet to experiment with:

static void Main(string[] args)
{
    while (true)
    {
        var g = Guid.NewGuid();
        Console.WriteLine(BitConverter.ToString(g.ToByteArray()));
        Console.WriteLine(g.ToString());
        Console.ReadLine();
    }
}

But still there is one thing I don't understand (except how to go on living). Why do we need these reserved bits? I see how it can harm - exposing internal implementation details, more collisions (still nothing to worry about, but one day...), more suicides - but I don't see any benefit. Can you help me to find any?

Inside GUID generation algorythm

like image 737
astef Avatar asked Dec 19 '25 21:12

astef


1 Answers

It is so that if you update the algorithm you can change that number. Otherwise 2 different algorithms could produce the exact same UUID for different reasons, leading to a collision. It is a version identifier.

For example, consider a contrived simplistic UUID format:

00000000-00000000
  time  -   ip

now suppose we change that format for some reason to:

00000000-00000000
   ip   -  time

This could generate a collision when a machine with IP 12.34.56.78 generates a UUID using the first method at time 01234567, and later a second machine with IP 01.23.45.67 generates a UUID at time 12345678 using the newer method. But if we reserve some bits for a version identifier, this cannot possibly cause a collision.

The value 4 specifically refers to a randomly generated UUID (therefore it relies on the miniscule chance of collisions given so many bits) rather than other methods which could use combinations of the time, mac address, pid, or other sorts of time & space identifiers to guarantee uniqueness.

See here for the relevant spec: https://www.rfc-editor.org/rfc/rfc4122#section-4.1.3

like image 82
Dave Avatar answered Dec 24 '25 10:12

Dave



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!