Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I cheaply produce an infinite splittable stream of non-colliding numbers from a single seed?

Tags:

I need a function split : Word64 -> (Word64, Word64) that takes any Word64 and splits it into two different Word64s in a way that I can keep splitting children and grandchildren in any arbitrary order while avoiding a collision. That is, for any pair sa, sb of consecutive splittings of a seed (such as sa = fst.split.fst.split$ seed), sa must be different from sb with >99.99% odds.

I've thought in using a pairing function, but that makes children exponentially bigger than parents, so, after a few splits there is an integer overflow. I need something that basically sends any value on the space of possible Word64 bits to two other values in a semi randomly fashion. Also, I need it to be as quick and simple as possible. The fewer instructions, the better. It is probably a very stupid calculation that I'm missing.

What can be used here?

Disclaimer: I have asked similar questions before, but now I finally have a better understanding of the problem and know exactly what I need.

like image 818
MaiaVictor Avatar asked Apr 02 '16 20:04

MaiaVictor


2 Answers

Unfortunately, I can't prove any statistical properties on the following, so it is just a heuristic. It is based on the xorshift family of pseudo-random sequence generators, which is known to be very fast.

Consider the xorshift* algorithm, here shown in C:

#include <stdint.h>

uint64_t x; /* The state must be seeded with a nonzero value. */

uint64_t xorshift64star(void) {
    x ^= x >> 12; // a
    x ^= x << 25; // b
    x ^= x >> 27; // c
    return x * UINT64_C(2685821657736338717);
}

The explanation in the wikipedia entry states:

A xorshift* generator takes a xorshift generator, and applies an invertible multiplication (modulo the word size) to its output as a non-linear transformation.

The heuristic algorithm I suggest is:

  • Give the head node some nonzero value x.

  • For each split, run the algorithm above with x at the beginning as the value of the parent. However, for the return value

    • for the left child, replace the last line with return x; (essentially the classic xorshift algorithm)

    • for the right child, use the last line as stated above.

At any node u, take any two nonidentical descendant paths leading to distinct v and w. Then either

  • v is a descendant of w, and thus at least either xorshift or xorshift* was applied at least once to the value of w to obtain the value of v.

  • w is a descendant of v, and the same argument can be made.

  • neither node is a parent of the other. In this case, at least a single non-linear transformation was made along the path leading from u to the lowest common ancestor of v and w (possibly u itself).

like image 189
Ami Tavory Avatar answered Sep 28 '22 03:09

Ami Tavory


Also starting with a disclaimer: I'm not a cryptographer and this might be really bad.

However, you can try \x -> (x*3+1, x*3+2). This relies on modulo-2^64 semantics for integer overflow. Quick tests didn't reveal any obvious collision patterns. The idea is that multiplication by 3 does not care about integer overflow. In particular, 3^n is always different value from 2*3^n, doesn't matter how big n.

This is the test I've done:

import Control.Monad
import Data.Int
import Text.Printf
import System.Random (randomRIO)
import qualified Data.Set as Set

split :: Int32 -> (Int32, Int32)
split x = (x * 3 + 1, x * 3 + 2)

bools bias n =
  replicateM n (randomRIO (0, 1.0) >>= \x -> if x < bias then return False else return True)

gogo bias =
  bools bias 100000 >>= \l -> do
    let things = scanl (\x b -> if b then fst (split x) else snd (split x)) 0 l
    printf "%.2f %d %d\n" bias (length things) (length $ Set.toList (Set.fromList things))
    return things

main = do
  things <- mapM gogo [0.0 :: Double, 0.05 .. 1]
  printf "all  %d %d\n" (length $ concat things) (length $ Set.toList (Set.fromList $ concat things))



0.00 100001 100001
0.05 100001 100001
0.10 100001 100001
0.15 100001 100001
0.20 100001 100001
0.25 100001 100000
0.30 100001 99997
0.35 100001 99998
0.40 100001 99999
0.45 100001 99999
0.50 100001 100001
0.55 100001 100000
0.60 100001 100001
0.65 100001 100001
0.70 100001 99998
0.75 100001 99999
0.80 100001 100001
0.85 100001 100001
0.90 100001 100001
0.95 100001 100001
1.00 100001 100001
all  2100021 2099351

The number of collisions seems to roughly agree with what one might expect from a random sample of that size.

like image 23
Rotsor Avatar answered Sep 28 '22 02:09

Rotsor