Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

I've seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want to ask again to see if there is a simpler solution in newer versions.

Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...

bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS

uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA

HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK

(Like the IDs that Stripe uses.)

How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?

I'm thinking that ideally the solution has a signature similar to:

generate_uid(size integer) returns text

Where size is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.

From what I can tell, it must use gen_random_bytes() instead of random() for true randomness, to reduce the chance that they can be guessed.

Thanks!


I know there's gen_random_uuid() for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ" that are as short as possible while still containing only alphanumeric characters.

This requirement is also why encode(gen_random_bytes(), 'hex') isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.

I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:

var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

function generate(length) {
  var bytes = crypto.randomBytes(length);
  var chars = [];

  for (var i = 0; i < bytes.length; i++) {
    chars.push(set[bytes[i] % set.length]);
  }

  return chars.join('');
}
like image 400
Ian Storm Taylor Avatar asked Feb 01 '17 01:02

Ian Storm Taylor


People also ask

How do you generate unique random numbers in PostgreSQL?

The random() function of postgreSQL is used to return a random number between 0 and 1 . This function can be used to generate a random number within a range.

How do I generate random text in PostgreSQL?

The PostgreSQL Provides a random() function to generate a random string with all the possible different numbers, character and symbol. We can also use random() function with cryptography or encryption function to generate a fixed length binary string.

What is UUID in Postgres?

UUID is an abbreviation for Universal Unique Identifier defined by RFC 4122 and has a size of 128-bit. It is created using internal algorithms that always generate a unique value. PostgreSQL has its own UUID data type and provides modules to generate them.

What is random function in PostgreSQL?

The random function will return a completely random number if no seed is provided (seed is set with the setseed function). The random function will return a repeatable sequence of random numbers each time a particular seed value is used (seed is set with the setseed function).


2 Answers

Figured this out, here's a function that does it:

CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
  characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  bytes BYTEA := gen_random_bytes(size);
  l INT := length(characters);
  i INT := 0;
  output TEXT := '';
BEGIN
  WHILE i < size LOOP
    output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
    i := i + 1;
  END LOOP;
  RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

And then to run it simply do:

generate_uid(10)
-- '3Rls4DjWxJ'

Warning

When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.


Usage

With the function defined, you can use it in a table definition, like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT generate_uid(10),
  name TEXT NOT NULL,
  ...
);

And then when inserting data, like so:

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

It will automatically generate the id values:

    id     |  name  | ...
-----------+--------+-----
owmCAx552Q | ian    |
ZIofD6l3X9 | victor |

Usage with a Prefix

Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
  name TEXT NOT NULL,
  ...
);

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

      id       |  name  | ...
---------------+--------+-----
col_wABNZRD5Zk | ian    |
col_ISzGcTVj8f | victor |
like image 167
Ian Storm Taylor Avatar answered Oct 21 '22 13:10

Ian Storm Taylor


Review,

  1. 26 characters in [a-z]
  2. 26 characters in [A-Z]
  3. 10 characters in [0-9]
  4. 62 characters in [a-zA-Z0-9] (base62)
  5. The function substring(string [from int] [for int]) looks useful.

So it looks something like this. First we demonstrate that we can take the random-range and pull from it.

SELECT substring(
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
  1, -- 1 is 'a', 62 is '9'
  1,
);

Now we need a range between 1 and 63

SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)

This gets us there.. Now we just have to join the two..

SELECT substring(
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
  trunc(random()*62)::int+1
  1
)
FROM generate_series(1,1e2) AS gs(x);

Then we wrap it in an ARRAY constructor (because this is fast)

SELECT ARRAY(
  SELECT substring(
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
    trunc(random()*62)::int+1,
    1
  )
  FROM generate_series(1,1e2) AS gs(x)
);

And, we call array_to_string() to get a text.

SELECT array_to_string(
  ARRAY(
      SELECT substring(
        'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
        trunc(random()*62)::int+1,
        1
      )
      FROM generate_series(1,1e2) AS gs(x)
  )
  , ''
);

From here we can even turn it into a function..

CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
  ARRAY(
      SELECT substring(
        'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
        trunc(random()*62)::int+1,
        1
      )
      FROM generate_series(1,randomLength) AS gs(x)
  )
  , ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;

and then

SELECT * FROM random_string(10);
like image 6
NO WAR WITH RUSSIA Avatar answered Oct 21 '22 14:10

NO WAR WITH RUSSIA