Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is python uuid1 sequential as timestamps?

Tags:

python

uuid

Python docs states that uuid1 uses current time to form the uuid value. But I could not find a reference that ensures UUID1 is sequential.

>>> import uuid
>>> u1 = uuid.uuid1()
>>> u2 = uuid.uuid1()
>>> u1 < u2
True
>>> 
like image 458
Carlo Pires Avatar asked Jan 03 '12 14:01

Carlo Pires


1 Answers

Argumentless use of uuid.uuid1() gives non-sequential results (see answer by @basil-bourque), but it can be easily made sequential if you set clock_seq or node arguments (because in this case uuid1 uses python implementation that guarantees to have unique and sequential timestamp part of the UUID in current process):

import time

from uuid import uuid1, getnode
from random import getrandbits

_my_clock_seq = getrandbits(14)
_my_node = getnode()


def sequential_uuid(node=None):
    return uuid1(node=node, clock_seq=_my_clock_seq)


def alt_sequential_uuid(clock_seq=None):
    return uuid1(node=_my_node, clock_seq=clock_seq)



if __name__ == '__main__':
    from itertools import count
    old_n = uuid1()  # "Native"
    old_s = sequential_uuid()  # Sequential

    native_conflict_index = None

    t_0 = time.time()

    for x in count():
        new_n = uuid1()
        new_s = sequential_uuid()

        if old_n > new_n and not native_conflict_index:
            native_conflict_index = x

        if old_s >= new_s:
            print("OOops: non-sequential results for `sequential_uuid()`")
            break

        if (x >= 10*0x3fff and time.time() - t_0 > 30) or (native_conflict_index and x > 2*native_conflict_index):
            print('No issues for `sequential_uuid()`')
            break

        old_n = new_n
        old_s = new_s

    print(f'Conflicts for `uuid.uuid1()`: {bool(native_conflict_index)}')
    print(f"Tries: {x}")

Multiple processes issues

BUT if you are running some parallel processes on the same machine, then:

  • node which defaults to uuid.get_node() will be the same for all the processes;
  • clock_seq has small chance to be the same for some processes (chance of 1/16384)

That might lead to conflicts! That is general concern for using uuid.uuid1 in parallel processes on the same machine unless you have access to SafeUUID from Python3.7.

If you make sure to also set node to unique value for each parallel process that runs this code, then conflicts should not happen.

Even if you are using SafeUUID, and set unique node, it's still possible to have non-sequential ids if they are generated in different processes.

If some lock-related overhead is acceptable, then you can store clock_seq in some external atomic storage (for example in "locked" file) and increment it with each call: this allows to have same value for node on all parallel processes and also will make id-s sequential. For cases when all parallel processes are subprocesses created using multiprocessing: clock_seq can be "shared" using multiprocessing.Value

like image 131
imposeren Avatar answered Sep 23 '22 06:09

imposeren