Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Math question regarding Python's uuid4

I'm not great with statistical mathematics, etc. I've been wondering, if I use the following:

import uuid
unique_str = str(uuid.uuid4())
double_str = ''.join([str(uuid.uuid4()), str(uuid.uuid4())])

Is double_str string squared as unique as unique_str or just some amount more unique? Also, is there any negative implication in doing something like this (like some birthday problem situation, etc)? This may sound ignorant, but I simply would not know as my math spans algebra 2 at best.

like image 966
orokusaki Avatar asked Nov 29 '10 17:11

orokusaki


People also ask

What is uuid4 () in Python?

If all you want is a unique ID, you should probably call uuid1() or uuid4() . Note that uuid1() may compromise privacy since it creates a UUID containing the computer's network address. uuid4() creates a random UUID. Depending on support from the underlying platform, uuid1() may or may not return a “safe” UUID.

Is uuid4 always unique python?

UUID is a Universally Unique Identifier. A UUID is 128 bits long number or ID to uniquely identify the documents, Users, resources or information in computer systems. UUID can guarantee the uniqueness of Identifiers across space and time.

How many characters is a uuid4?

UUID4 returns a CHARACTER(36) string that is a version 4 universally unique identifier. The UUID4 generated by PL/I is a version 4 UUID per RFC 4122. It is meant for generating UUIDs from truly-random or pseudo-random numbers.

What is the difference between UUID and uuid4?

uuid1 is "more unique", while uuid4 is more anonymous. So basically use uuid1 unless you have a reason not to.


1 Answers

The uuid4 function returns a UUID created from 16 random bytes and it is extremely unlikely to produce a collision, to the point at which you probably shouldn't even worry about it.

If for some reason uuid4 does produce a duplicate it is far more likely to be a programming error such as a failure to correctly initialize the random number generator than genuine bad luck. In which case the approach you are using it will not make it any better - an incorrectly initialized random number generator can still produce duplicates even with your approach.

If you use the default implementation random.seed(None) you can see in the source that only 16 bytes of randomness are used to initialize the random number generator, so this is an a issue you would have to solve first. Also, if the OS doesn't provide a source of randomness the system time will be used which is not very random at all.

But ignoring these practical issues, you are basically along the right lines. To use a mathematical approach we first have to define what you mean by "uniqueness". I think a reasonable definition is the number of ids you need to generate before the probability of generating a duplicate exceeds some probability p. An approcimate formula for this is:

alt text

where d is 2**(16*8) for a single randomly generated uuid and 2**(16*2*8) with your suggested approach. The square root in the formula is indeed due to the Birthday Paradox. But if you work it out you can see that if you square the range of values d while keeping p constant then you also square n.

like image 84
Mark Byers Avatar answered Sep 21 '22 23:09

Mark Byers