Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently generate a 16-character, alphanumeric string

I'm looking for a very quick way to generate an alphanumeric unique id for a primary key in a table.

Would something like this work?

def genKey():
    hash = hashlib.md5(RANDOM_NUMBER).digest().encode("base64")
    alnum_hash = re.sub(r'[^a-zA-Z0-9]', "", hash)
    return alnum_hash[:16]

What would be a good way to generate random numbers? If I base it on microtime, I have to account for the possibility of several calls of genKey() at the same time from different instances.

Or is there a better way to do all this?

like image 943
ensnare Avatar asked Mar 24 '10 20:03

ensnare


People also ask

What is alphanumeric string example?

Alphanumeric is a description of data that is both letters and numbers. For example, "1a2b3c" is a short string of alphanumeric characters. Alphanumeric is commonly used to help explain the availability of text that can be entered or used in a field, such as an alphanumeric password field.


9 Answers

As none of the answers provide you with a random string consisting of characters 0-9, a-z, A-Z: Here is a working solution which will give you one of approx. 62^16 = 4.76724 e+28 keys:

import random, string
x = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(16))
print(x)

It is also very readable without knowing ASCII codes by heart.

There is an even shorter version since python 3.6.2:

import random, string
x = ''.join(random.choices(string.ascii_letters + string.digits, k=16))
print(x)
like image 81
David Schumann Avatar answered Oct 14 '22 03:10

David Schumann


You can use this:

>>> import random
>>> ''.join(random.choice('0123456789ABCDEF') for i in range(16))
'E2C6B2E19E4A7777'

There is no guarantee that the keys generated will be unique so you should be ready to retry with a new key in the case the original insert fails. Also, you might want to consider using a deterministic algorithm to generate a string from an auto-incremented id instead of using random values, as this will guarantee you uniqueness (but it will also give predictable keys).

like image 36
Mark Byers Avatar answered Oct 14 '22 03:10

Mark Byers


Have a look at the uuid module (Python 2.5+).

A quick example:

>>> import uuid
>>> uid = uuid.uuid4()
>>> uid.hex
'df008b2e24f947b1b873c94d8a3f2201'

Note that the OP asked for a 16-character alphanumeric string, but UUID4 strings are 32 characters long. You should not truncate this string, instead, use the complete 32 characters.

like image 43
ChristopheD Avatar answered Oct 14 '22 01:10

ChristopheD


In Python 3.6, released in December 2016, the secrets module was introduced.

You can now generate a random token this way :

import secrets

secrets.token_hex(16)

From the Python docs :

The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.

In particularly, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography.

https://docs.python.org/3/library/secrets.html

like image 26
Brachamul Avatar answered Oct 14 '22 03:10

Brachamul


For random numbers a good source is os.urandom:

 >> import os
 >> import hashlib
 >> random_data = os.urandom(128)
 >> hashlib.md5(random_data).hexdigest()[:16]
like image 25
rlotun Avatar answered Oct 14 '22 03:10

rlotun


There's an official recipe:

import string
import secrets
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet) for i in range(16))
print(password)

This will create output similar to 'STCT3jdDUkppph03'.

like image 28
mathandy Avatar answered Oct 14 '22 02:10

mathandy


>>> import random
>>> ''.join(random.sample(map(chr, range(48, 57) + range(65, 90) + range(97, 122)), 16))
'CDh0geq3NpKtcXfP'
like image 41
Jan Matějka Avatar answered Oct 14 '22 03:10

Jan Matějka


This value is incremented by 1 on each call (it wraps around). Deciding where the best place to store the value will depend on how you are using it. You may find this explanation of interest, as it discusses not only how Guids work but also how to make a smaller one.

The short answer is this: Use some of those characters as a timestamp and the other characters as a "uniquifier," a value increments by 1 on each call to your uid generator.

like image 42
Brian Avatar answered Oct 14 '22 03:10

Brian


I would prefer urandom over secrets.token_hex, as it samples from a richer character set and hence needs a smaller length to achieve the same entropy.

os.urandom, which reads from urandom, is considered secure (see the relevant answer in a question if urandom is secure). You can then read as much as you like from urandom and produce a random alphanummeric as follows:

import math
import os
def random_alphanumeric(str_len: int) -> str:
  rand_len = 3 * (math.ceil(str_len / 3) + 1)
  return base64.b64encode(os.urandom(rand_len), altchars=b'aA').decode('ascii')[:str_len]

NOTE: The above function is not secure. Since you need a "very quick way to generate an alphanumeric", this function sacrifices performance over security, since the frequencies of a and A (or whatever characters you choose to replace + and / with) will be increased compared to what urandom would give you otherwise.

If you put randomness above performance, you could do something like:

def secure_random_alphanumeric(str_len: int) -> str:
  ret = ''
  while len(ret) < str_len:
    rand_len = 3 * (math.ceil((str_len - len(ret)) / 3) + 2)
    ret += base64.b64encode(os.urandom(rand_len)).decode('ascii').replace('+', '').replace('/', '').replace('=', '')
  return ret[:str_len]

Note that chaining replace turns out to be faster than sequntially calling it, as per this answer.

Also, in the above, +1 is replaced by +2 when determining rand_lento reduce the number of iterations needed to achieve the requested length. You could even replace by +3 or more to reduce even more the possibility for an iteration, but then you would loose in performance at the chained replace calls.

like image 24
pavlaras Avatar answered Oct 14 '22 01:10

pavlaras