I would like to use the pseudo_encrypt function mentioned a few times on StackOverflow to make my IDs look more random: https://wiki.postgresql.org/wiki/Pseudo_encrypt
How can I customize this to output unique "random" numbers for just me. I read somewhere that you can just change the 1366.0 constant, but I don't want to take any risks with my IDs as any potential ID duplicates would cause major issues.
I really have no idea what each constant actually does, so I don't want to mess around with it unless I get some direction. Does anyone know which constants I can safely change?
Here it is:
CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" int) RETURNS int IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= ("VALUE" >> 16) & 65535;
r1:= "VALUE" & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
r1 := l2;
l1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::int << 16) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
for bigint's
CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" bigint) RETURNS bigint IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= ("VALUE" >> 32) & 4294967295::bigint;
r1:= "VALUE" & 4294967295;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::bigint;
r1 := l2;
l1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
Alternative solution: use different ciphers
Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.
For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).
For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.
Otherwise, to just customize pseudo_encrypt
, see below:
Explanations about pseudo_encrypt
's implementation:
This function has 3 properties
The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366
and also 150889
and 714025
.
Make sure when changing f(r1)
that it stays a function in the mathematical sense, that is x=y
implies f(x)=f(y)
, or in other words the same input must always produce the same output. Breaking this would break the unicity.
The purpose of these constants and this formula for f(r1)
is to produce a reasonably good pseudo-random effect. Using postgres built-in random()
or similar method is not possible because it's not a mathematical function as described above.
Why these arbitrary constants? In this part of the function:
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
The formula and the values 1366
, 150889
and 714025
come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285.
The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.
Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran
) is:
jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im; /* normalize into the 0..1 range */
where jran
is the current random integer.
This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia
, ic
and im
have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.
ia=1366
, ic=150889
and im=714025
is one of the entries for a period of
229 bits, which is way more than needed.
Finally the multiplication by 32767
or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With