Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to customize the output of the Postgres Pseudo Encrypt function?

I would like to use the pseudo_encrypt function mentioned a few times on StackOverflow to make my IDs look more random: https://wiki.postgresql.org/wiki/Pseudo_encrypt

How can I customize this to output unique "random" numbers for just me. I read somewhere that you can just change the 1366.0 constant, but I don't want to take any risks with my IDs as any potential ID duplicates would cause major issues.

I really have no idea what each constant actually does, so I don't want to mess around with it unless I get some direction. Does anyone know which constants I can safely change?

Here it is:

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" int) RETURNS int     IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 16) & 65535;
    r1:= "VALUE" & 65535;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
        r1 := l2;
        l1 := r2;
        i := i + 1;
END LOOP;
RETURN ((l1::int << 16) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;

for bigint's

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" bigint) RETURNS bigint IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 32) & 4294967295::bigint;
    r1:= "VALUE" & 4294967295;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::bigint;
        r1 := l2;
        l1 := r2;
        i := i + 1;
    END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
like image 452
avian Avatar asked Jan 08 '23 05:01

avian


1 Answers

Alternative solution: use different ciphers

Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.

For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).

For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.

Otherwise, to just customize pseudo_encrypt, see below:

Explanations about pseudo_encrypt's implementation:

This function has 3 properties

  • global unicity of the output values
  • reversability
  • pseudo-random effect

The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366 and also 150889 and 714025.

Make sure when changing f(r1) that it stays a function in the mathematical sense, that is x=y implies f(x)=f(y), or in other words the same input must always produce the same output. Breaking this would break the unicity.

The purpose of these constants and this formula for f(r1) is to produce a reasonably good pseudo-random effect. Using postgres built-in random() or similar method is not possible because it's not a mathematical function as described above.

Why these arbitrary constants? In this part of the function:

r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;

The formula and the values 1366, 150889 and 714025 come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285. The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.

Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran) is:

jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im;  /* normalize into the 0..1 range */

where jran is the current random integer.

This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia, ic and im have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.

ia=1366, ic=150889 and im=714025 is one of the entries for a period of 229 bits, which is way more than needed.

Finally the multiplication by 32767 or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.

like image 116
Daniel Vérité Avatar answered Jan 16 '23 20:01

Daniel Vérité