When should I use uuid.uuid1() vs. uuid.uuid4() in python?

People also ask

Which UUID version should I use?

If you want a unique ID that's not random, UUID v5 could be the right choice. Unlike v1 or v4, UUID v5 is generated by providing two pieces of input information: Input string: Any string that can change in your application.

What is UUID uuid4 in Python?

UUID, Universal Unique Identifier, is a python library which helps in generating random objects of 128 bits as ids. It provides the uniqueness as it generates ids on the basis of time, Computer hardware (MAC etc.). Advantages of UUID : Can be used as general utility to generate unique random id.

What is str UUID uuid4 ())?

Well, as you can see above, str(uuid4()) returns a string representation of the UUID with the dashes included, while uuid4().hex returns "The UUID as a 32-character hexadecimal string"

Is Python UUID cryptographically secure?

It is a cryptographically secure PRNG, but during a small time in system start up, it may not be correctly seeded. If you need long term keys, it may be better to get some 256 bits from /dev/random before using /dev/urandom.

uuid1() is guaranteed to not produce any collisions (under the assumption you do not create too many of them at the same time). I wouldn't use it if it's important that there's no connection between the uuid and the computer, as the mac address gets used to make it unique across computers.

You can create duplicates by creating more than 2¹⁴ uuid1 in less than 100ns, but this is not a problem for most use cases.

uuid4() generates, as you said, a random UUID. The chance of a collision is really, really, really small. Small enough, that you shouldn't worry about it. The problem is, that a bad random-number generator makes it more likely to have collisions.

This excellent answer by Bob Aman sums it up nicely. (I recommend reading the whole answer.)

Frankly, in a single application space without malicious actors, the extinction of all life on earth will occur long before you have a collision, even on a version 4 UUID, even if you're generating quite a few UUIDs per second.

One instance when you may consider uuid1() rather than uuid4() is when UUIDs are produced on separate machines, for example when multiple online transactions are process on several machines for scaling purposes.

In such a situation, the risks of having collisions due to poor choices in the way the pseudo-random number generators are initialized, for example, and also the potentially higher numbers of UUIDs produced render more likely the possibility of creating duplicate IDs.

Another interest of uuid1(), in that case is that the machine where each GUID was initially produced is implicitly recorded (in the "node" part of UUID). This and the time info, may help if only with debugging.

My team just ran into trouble using UUID1 for a database upgrade script where we generated ~120k UUIDs within a couple of minutes. The UUID collision led to violation of a primary key constraint.

We've upgraded 100s of servers but on our Amazon EC2 instances we ran into this issue a few times. I suspect poor clock resolution and switching to UUID4 solved it for us.

One thing to note when using uuid1, if you use the default call (without giving clock_seq parameter) you have a chance of running into collisions: you have only 14 bit of randomness (generating 18 entries within 100ns gives you roughly 1% chance of a collision see birthday paradox/attack). The problem will never occur in most use cases, but on a virtual machine with poor clock resolution it will bite you.

Perhaps something that's not been mentioned is that of locality.

A MAC address or time-based ordering (UUID1) can afford increased database performance, since it's less work to sort numbers closer-together than those distributed randomly (UUID4) (see here).

A second related issue, is that using UUID1 can be useful in debugging, even if origin data is lost or not explicitly stored (this is obviously in conflict with the privacy issue mentioned by the OP).

In addition to the accepted answer, there's a third option that can be useful in some cases:

v1 with random MAC ("v1mc")

You can make a hybrid between v1 & v4 by deliberately generating v1 UUIDs with a random broadcast MAC address (this is allowed by the v1 spec). The resulting v1 UUID is time dependant (like regular v1), but lacks all host-specific information (like v4). It's also much closer to v4 in it's collision-resistance: v1mc = 60 bits of time + 61 random bits = 121 unique bits; v4 = 122 random bits.

First place I encountered this was Postgres' uuid_generate_v1mc() function. I've since used the following python equivalent:

from os import urandom
from uuid import uuid1
_int_from_bytes = int.from_bytes  # py3 only

def uuid1mc():
    # NOTE: The constant here is required by the UUIDv1 spec...
    return uuid1(_int_from_bytes(urandom(6), "big") | 0x010000000000)

(note: I've got a longer + faster version that creates the UUID object directly; can post if anyone wants)

In case of LARGE volumes of calls/second, this has the potential to exhaust system randomness. You could use the stdlib random module instead (it will probably also be faster). But BE WARNED: it only takes a few hundred UUIDs before an attacker can determine the RNG state, and thus partially predict future UUIDs.

import random
from uuid import uuid1

def uuid1mc_insecure():
    return uuid1(random.getrandbits(48) | 0x010000000000)

Related questions
                            
                                TypeError: not all arguments converted during string formatting python
                            
                                Can I serve multiple clients using just Flask app.run() as standalone?
                            
                                How can I verify if one list is a subset of another?
                            
                                How to create new folder? [duplicate]
                            
                                Split a python list into other "sublists" i.e smaller lists [duplicate]
                            
                                pandas groupby sort within groups
                            
                                SSL InsecurePlatform error when using Requests package
                            
                                Which Python packages offer a stand-alone event system? [closed]
                            
                                Initialise a list to a specific length in Python [duplicate]
                            
                                How to check if a string in Python is in ASCII?
                            
                                Timeout for python requests.get entire response
                            
                                How to determine whether a Pandas Column contains a particular value
                            
                                python re.sub group: number after \number
                            
                                How to open a file using the open with statement
                            
                                sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 74 supplied
                            
                                pyplot axes labels for subplots
                            
                                How do I load a file into the python console?
                            
                                Python app does not print anything when running detached in docker
                            
                                Django Server Error: port is already in use
                            
                                Clear variable in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When should I use uuid.uuid1() vs. uuid.uuid4() in python?

Tags:

python

uuid

People also ask

v1 with random MAC ("v1mc")

Recent Activity

Donate For Us