I am using an api which takes a name of 21 char max to represent an internal session which has a lifetime of around "two days". I would like the name not to be meaningfull using some kind of hasing ? md5 generates 40 chars, is there something else i could use ?
For now i use 'userid[:10]' + creation time: ddhhmmss + random 3 chars.
Thanks,
If I read your question correctly, you want to generate some arbitrary identifier token which must be 21 characters max. Does it need to be highly resistant to guessing? The example you gave isn't "crytographically strong" in that it can be guessed by searching well less than 1/2 of the entire possible keyspace.
You don't say if the characters can be all 256 ASCII characters, or if it needs to be limited to, say, printable ASCII (33-127, inclusive), or some smaller range.
There is a Python module designed for UUIDs (Universals Unique IDentifiers). You likely want uuid4 which generates a random UUID, and uses OS support if available (on Linux, Mac, FreeBSD, and likely others).
>>> import uuid
>>> u = uuid.uuid4()
>>> u
UUID('d94303e7-1be4-49ef-92f2-472bc4b4286d')
>>> u.bytes
'\xd9C\x03\xe7\x1b\xe4I\xef\x92\xf2G+\xc4\xb4(m'
>>> len(u.bytes)
16
>>>
16 random bytes is very unguessable, and there's no need to use the full 21 bytes your API allows, if all you want is to have an unguessable opaque identifier.
If you can't use raw bytes like that, which is probably a bad idea because it's harder to use in logs and other debug messages and harder to compare by eye, then convert the bytes into something a bit more readable, like using base-64 encoding, with the result chopped down to 21 (or whatever) bytes:
>>> u.bytes.encode("base64")
'2UMD5xvkSe+S8kcrxLQobQ==\n'
>>> len(u.bytes.encode("base64"))
25
>>> u.bytes.encode("base64")[:21]
'2UMD5xvkSe+S8kcrxLQob'
>>>
This gives you an extremely high quality random string of length 21.
You might not like the '+' or '/' which can be in a base-64 string, since without proper escaping that might interfere with URLs. Since you already think to use "random 3 chars", I don't think this is a worry of yours. If it is, you could replace those characters with something else ('-' and '.' might work), or remove them if present.
As others have pointed out, you could use .encode("hex") and get the hex equivalent, but that's only 4 bits of randomness/character * 21 characters max gives you 84 bits of randomness instead of twice that. Every bit doubles your keyspace, making the theoretical search space much, much smaller. By a factor of 2E24 smaller.
Your keyspace is still 2E24 in size, even with hex encoding, so I think it's more a theoretical concern. I wouldn't worry about people doing brute force attacks against your system.
Edit:
P.S.: The uuid.uuid4 function uses libuuid if available. That gets its entropy from os.urandom (if available) otherwise from the current time and the local ethernet MAC address. If libuuid is not available then the uuid.uuid4 function gets the bytes directly from os.urandom (if available) otherwise it uses the random module. The random module uses a default seed based on os.urandom (if available) otherwise a value based on the current time. Probing takes place for every function call, so if you don't have os.urandom then the overhead is a bit bigger than you might expect.
Take home message? If you know you have os.urandom then you could do
os.urandom(16).encode("base64")[:21]
but if you don't want to worry about its availability then use the uuid module.
The hexadecimal representation of MD5 has very poor randomness: you only get 4 bits of entropy per character.
Use random characters, something like:
import random
import string
"".join([random.choice(string.ascii_letters + string.digits + ".-")
for i in xrange(21)])
In the choice put all the acceptable characters.
While using a real hash function such as SHA1 will also get you nice results if used correctly, the added complexity and CPU consumption seems not justified for your needs. You only want a random string.
Why not take first 21 chars from md5 or SHA1 hash?
The base64 module can do URL-safe encoding. So, if needed, instead of
u.bytes.encode("base64")
you could do
import base64
token = base64.urlsafe_b64encode(u.bytes)
and, conveniently, to convert back
u = uuid.UUID(bytes=base64.urlsafe_b64decode(token))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With