Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms: random unique string

I need to generate string that meets the following requirements:

  1. it should be a unique string;
  2. string length should be 8 characters;
  3. it should contain 2 digits;
  4. all symbols (non-digital characters) should be upper case.

I will store them in a data base after generation (they will be assigned to other entities).

My intention is to do something like this:

  1. Generate 2 random values from 0 to 9—they will be used for digits in the string;
  2. generate 6 random values from 0 to 25 and add them to 64—they will be used as 6 symbols;
  3. concatenate everything into one string;
  4. check if the string already exists in the data base; if not—repeat.

My concern with regard to that algorithm is that it doesn't guarantee a result in finite time (if there are already A LOT of values in the data base).

Question: could you please give advice on how to improve this algorithm to be more deterministic?

Thanks.

like image 634
Budda Avatar asked Feb 26 '23 08:02

Budda


2 Answers

  1. it should be unique string;
  2. string length should be 8 characters;
  3. it should contains 2 digits;
  4. all symbols (non-digital characters) - should be upper case.

Assuming:

  • requirements #2 and #3 are exact (exactly 8 chars, exactly 2 digits) and not a minimum
  • the "symbols" in requirement #4 are the 26 capital letters A through Z
  • you would like an evenly-distributed random string

Then your proposed method has two issues. One is that the letters A - Z are ASCII 65 - 90, not 64 - 89. The other is that it doesn't distribute the numbers evenly within the possible string space. That can be remedied by doing the following:

  1. Generate two different integers between 0 and 7, and sort them.
  2. Generate 2 random numbers from 0 to 9.
  3. Generate 6 random letters from A to Z.
  4. Use the two different integers in step #1 as positions, and put the 2 numbers in those positions.
  5. Put the 6 random letters in the remaining positions.

There are 28 possibilities for the two different integers ((8*8 - 8 duplicates) / 2 orderings), 266 possibilities for the letters, and 100 possibilities for the numbers, the total # of valid combinations being Ncomb = 864964172800 = 8.64 x 1011.


edit: If you want to avoid the database for storage, but still guarantee both uniqueness of strings and have them be cryptographically secure, your best bet is a cryptographically random bijection from a counter between 0 and Nmax <= Ncomb to a subset of the space of possible output strings. (Bijection meaning there is a one-to-one correspondence between the output string and the input counter.)

This is possible with Feistel networks, which are commonly used in hash functions and symmetric cryptography (including AES). You'd probably want to choose Nmax = 239 which is the largest power of 2 <= Ncomb, and use a 39-bit Feistel network, using a constant key you keep secret. You then plug in your counter to the Feistel network, and out comes another 39-bit number X, which you then transform into the corresponding string as follows:

  1. Repeat the following step 6 times:
  2. Take X mod 26, generate a capital letter, and set X = X / 26.
  3. Take X mod 100 to generate your two digits, and set X = X / 100.
  4. X will now be between 0 and 17 inclusive (239 / 266 / 100 = 17.796...). Map this number to two unique digit positions (probably easiest using a lookup table, since we're only talking 28 possibilities. If you had more, use Floyd's algorithm for generating a unique permutation, and use the variable-base technique of mod + integer divide instead of generating a random number).
  5. Follow the random approach above, but use the numbers generated by this algorithm instead.

Alternatively, use 40-bit numbers, and if the output of your Feistel network is > Ncomb, then increment the counter and try again. This covers the entire string space at the cost of rejecting invalid numbers and having to re-execute the algorithm. (But you don't need a database to do this.)

But this isn't something to get into unless you know what you're doing.

like image 98
Jason S Avatar answered Feb 27 '23 22:02

Jason S


Are these user passwords? If so, there are a couple of things you need to take into account:

  1. You must avoid 0/O and I/1, which can easily be mistaken for each other.
  2. You must avoid too many consecutive letters, which might spell out a rude word.

As far as 2 is concerned, you can avoid the problem by using LLNLLNLL as your pattern (L = letter, N = number).

If you need 1 million passwords out of a pool of 2.5 billion, you will certainly get clashes in your database, so you have to deal with them gracefully. But a simple retry is enough, if your random number generator is robust.

like image 32
TonyK Avatar answered Feb 27 '23 21:02

TonyK