I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.
I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:
>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')
And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:
>>> import uuid
>>> s = set()
>>> for i in range(5000000): # That's 5 million!
>>> s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000
Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()
s.
So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?
UUID is a Universally Unique Identifier. A UUID is 128 bits long number or ID to uniquely identify the documents, Users, resources or information in computer systems. UUID can guarantee the uniqueness of Identifiers across space and time.
The uuid module provides immutable UUID objects (the UUID class) and the functions uuid1() , uuid3() , uuid4() , uuid5() for generating version 1, 3, 4, and 5 UUIDs as specified in RFC 4122. If all you want is a unique ID, you should probably call uuid1() or uuid4() .
The thread-unsafe part of Python 2.5's uuid. uuid1() is when it compares the current current timestamp to the previous timestamp. Without a lock, two processes can end up comparing against the same globally saved timestamp.
UUID, Universal Unique Identifier, is a python library which helps in generating random objects of 128 bits as ids. It provides the uniqueness as it generates ids on the basis of time, Computer hardware (MAC etc.). Advantages of UUID : Can be used as general utility to generate unique random id.
Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.
Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.
Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With