Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What numbers that I can put in numpy.random.seed()?

I have noticed that you can put various numbers inside of numpy.random.seed(), for example numpy.random.seed(1), numpy.random.seed(101). What do the different numbers mean? How do you choose the numbers?

like image 768
Shelly Avatar asked Apr 25 '16 17:04

Shelly


People also ask

Why is seed 42?

What is the significance of random. seed(42) ? It's a pop-culture reference! In Douglas Adams's popular 1979 science-fiction novel The Hitchhiker's Guide to the Galaxy, towards the end of the book, the supercomputer Deep Thought reveals that the answer to the great question of “life, the universe and everything” is 42.

How does NumPy generate random numbers?

Numpy's random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions: BitGenerators: Objects that generate random numbers.

What is seed value?

A seed value specifies a particular stream from a set of possible random number streams. When you specify a seed, SAS generates the same set of pseudorandom numbers every time you run the program.

What does random seed 0 do in Python?

Definition and Usage The seed() method is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time.


3 Answers

Consider a very basic random number generator:

Z[i] = (a*Z[i-1] + c) % m

Here, Z[i] is the ith random number, a is the multiplier and c is the increment - for different a, c and m combinations you have different generators. This is known as the linear congruential generator introduced by Lehmer. The remainder of that division, or modulus (%), will generate a number between zero and m-1 and by setting U[i] = Z[i] / m you get random numbers between zero and one.

As you may have noticed, in order to start this generative process - in order to have a Z[1] you need to have a Z[0] - an initial value. This initial value that starts the process is called the seed. Take a look at this example:

enter image description here

The initial value, the seed is determined as 7 to start the process. However, that value is not used to generate a random number. Instead, it is used to generate the first Z.

The most important feature of a pseudo-random number generator would be its unpredictability. Generally, as long as you don't share your seed, you are fine with all seeds as the generators today are much more complex than this. However, as a further step you can generate the seed randomly as well. You can skip the first n numbers as another alternative.

Main source: Law, A. M. (2007). Simulation modeling and analysis. Tata McGraw-Hill.

like image 150
ayhan Avatar answered Oct 14 '22 05:10

ayhan


The short answer:

There are three ways to seed() a random number generator in numpy.random:

  • use no argument or use None -- the RNG initializes itself from the OS's random number generator (which generally is cryptographically random)

  • use some 32-bit integer N -- the RNG will use this to initialize its state based on a deterministic function (same seed → same state)

  • use an array-like sequence of 32-bit integers n0, n1, n2, etc. -- again, the RNG will use this to initialize its state based on a deterministic function (same values for seed → same state). This is intended to be done with a hash function of sorts, although there are magic numbers in the source code and it's not clear why they are doing what they're doing.

If you want to do something repeatable and simple, use a single integer.

If you want to do something repeatable but unlikely for a third party to guess, use a tuple or a list or a numpy array containing some sequence of 32-bit integers. You could, for example, use numpy.random with a seed of None to generate a bunch of 32-bit integers (say, 32 of them, which would generate a total of 1024 bits) from the OS's RNG, store in some seed S which you save in some secret place, then use that seed to generate whatever sequence R of pseudorandom numbers you wish. Then you can later recreate that sequence by re-seeding with S again, and as long as you keep the value of S secret (as well as the generated numbers R), no one would be able to reproduce that sequence R. If you just use a single integer, there's only 4 billion possibilities and someone could potentially try them all. That may be a bit on the paranoid side, but you could do it.


Longer answer

The numpy.random module uses the Mersenne Twister algorithm, which you can confirm yourself in one of two ways:

  • Either by looking at the documentation for numpy.random.RandomState, of which numpy.random uses an instance for the numpy.random.* functions (but you can also use an isolated independent instance of)

  • Looking at the source code in mtrand.pyx which uses something called Pyrex to wrap a fast C implementation, and randomkit.c and initarray.c.

In any case here's what the numpy.random.RandomState documentation says about seed():

Compatibility Guarantee A fixed seed and a fixed series of calls to RandomState methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect. Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring. Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.

Parameters:
seed
: {None, int, array_like}, optional

Random seed used to initialize the pseudo-random number generator. Can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default). If seed is None, then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.

It doesn't say how the seed is used, but if you dig into the source code it refers to the init_by_array function: (docstring elided)

def seed(self, seed=None):
    cdef rk_error errcode
    cdef ndarray obj "arrayObject_obj"
    try:
        if seed is None:
            with self.lock:
                errcode = rk_randomseed(self.internal_state)
        else:
            idx = operator.index(seed)
            if idx > int(2**32 - 1) or idx < 0:
                raise ValueError("Seed must be between 0 and 2**32 - 1")
            with self.lock:
                rk_seed(idx, self.internal_state)
    except TypeError:
        obj = np.asarray(seed).astype(np.int64, casting='safe')
        if ((obj > int(2**32 - 1)) | (obj < 0)).any():
            raise ValueError("Seed must be between 0 and 2**32 - 1")
        obj = obj.astype('L', casting='unsafe')
        with self.lock:
            init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
                PyArray_DIM(obj, 0))

And here's what the init_by_array function looks like:

extern void
init_by_array(rk_state *self, unsigned long init_key[], npy_intp key_length)
{
    /* was signed in the original code. RDH 12/16/2002 */
    npy_intp i = 1;
    npy_intp j = 0;
    unsigned long *mt = self->key;
    npy_intp k;

    init_genrand(self, 19650218UL);
    k = (RK_STATE_LEN > key_length ? RK_STATE_LEN : key_length);
    for (; k; k--) {
        /* non linear */
        mt[i] = (mt[i] ^ ((mt[i - 1] ^ (mt[i - 1] >> 30)) * 1664525UL))
            + init_key[j] + j;
        /* for > 32 bit machines */
        mt[i] &= 0xffffffffUL;
        i++;
        j++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
        if (j >= key_length) {
            j = 0;
        }
    }
    for (k = RK_STATE_LEN - 1; k; k--) {
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
             - i; /* non linear */
        mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
        i++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
    }

    mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
    self->gauss = 0;
    self->has_gauss = 0;
    self->has_binomial = 0;
}

This essentially "munges" the random number state in a nonlinear, hash-like method using each value within the provided sequence of seed values.

like image 38
Jason S Avatar answered Oct 14 '22 06:10

Jason S


What is normally called a random number sequence in reality is a "pseudo-random" number sequence because the values are computed using a deterministic algorithm and probability plays no real role.

The "seed" is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers. This is very useful for example for debugging (when you are looking for an error in a program you need to be able to reproduce the problem and study it, a non-deterministic program would be much harder to debug because every run would be different).

like image 23
Arshdeep Singh Avatar answered Oct 14 '22 06:10

Arshdeep Singh