Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are duplicate UUIDs being generated from python on GCP?

I am facing this weird issue. Some (5%) of my celery tasks are silently being dropped.

Doing some digging in celery logs, I found that in some cases, same task IDs get generated for different tasks. Naturally, any new task overwrites an existing task with the same task ID; causing the old task to silently drop (if it wasn't executed).

In a span of 1.5 hours, the same UUID was generated 3 times. I did some random sampling and this turned to be the case on the same machine, in a short span (1-2 hours). The server generates around 1 million UUIDs a day. A minuscule number with 7 digits compared to a number with 38 digits- the number of possible UUIDs.

I am running python 3.6, and celery 4.4.2 on a Linux VM.

Celery uses python's uuid.uuid4: Reference

I'm not sure how to proceed from here. Is there a bug in a version of python (or the linux kernel), some configuration issue, or a hardware/VM bug? All scenarios seem very unlikely.

Update:

The VM is a standard Google Cloud Plaftform compute instance running ubuntu 18 LTS.

like image 780
Vedant Agarwala Avatar asked Jun 10 '20 20:06

Vedant Agarwala


1 Answers

I couldn't figure out why but I implemented a workaround.

I monkey patched uuid.uuid4. For some reason I was unable to do the same for celery.utils.uuid or kombu.utils.uuid.

I made a very simple random number generator that concatenates the system nano time, and the hostname, and generates a UUID:

def __my_uuid_generator():
    time_hex = float.hex(time.monotonic())[4:-4]  # 13 chars
    host = hex(abs(hash(socket.gethostname())))[2:]  # 16 chars
    hashed = bytes(f'{time_hex}{host}', 'ascii').hex()[:32]  # always a 32 chars long hex string
    return uuid.UUID(hashed)

# Monkey patch uuid4, because https://stackoverflow.com/q/62312607/1396264. Sigh!
uuid.uuid4 = __my_uuid_generator
like image 94
Vedant Agarwala Avatar answered Oct 20 '22 15:10

Vedant Agarwala