Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Why use anything other than uuid4() for unique strings?

I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.

I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:

>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')

And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:

>>> import uuid
>>> s = set()
>>> for i in range(5000000):  # That's 5 million!
>>>     s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000

Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()s.

So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?

like image 770
orokusaki Avatar asked Mar 12 '10 18:03

orokusaki


People also ask

Is uuid4 always unique python?

UUID is a Universally Unique Identifier. A UUID is 128 bits long number or ID to uniquely identify the documents, Users, resources or information in computer systems. UUID can guarantee the uniqueness of Identifiers across space and time.

How do you create a unique UUID in Python?

The uuid module provides immutable UUID objects (the UUID class) and the functions uuid1() , uuid3() , uuid4() , uuid5() for generating version 1, 3, 4, and 5 UUIDs as specified in RFC 4122. If all you want is a unique ID, you should probably call uuid1() or uuid4() .

Is UUID thread safe python?

The thread-unsafe part of Python 2.5's uuid. uuid1() is when it compares the current current timestamp to the previous timestamp. Without a lock, two processes can end up comparing against the same globally saved timestamp.

Is UUID built into Python?

UUID, Universal Unique Identifier, is a python library which helps in generating random objects of 128 bits as ids. It provides the uniqueness as it generates ids on the basis of time, Computer hardware (MAC etc.). Advantages of UUID : Can be used as general utility to generate unique random id.


2 Answers

Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.

Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.

like image 126
Arion Avatar answered Sep 28 '22 06:09

Arion


Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.

like image 29
Ben Voigt Avatar answered Sep 28 '22 06:09

Ben Voigt