Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correctly setting random seeds for repeatability

The method for setting random seeds using the Fortran 90 subroutine random_seed is quite straightforward.

call random_seed( put=seed )

But I can't find any information about guidelines for setting the seed (which is absolutely necessary when you want repeatability). Folklore I've heard in the past suggested that scalar seeds should be large. E.g. 123456789 is a better seed than 123. The only support for this I can find on the web is that it is suggested for the ifort extension function ran() that one use a "large, odd integer value"

I understand this might be implementation specific and am using gfortran 4.8.5 but am also interested in ifort and (if possible) general guidelines that are independent of implementation. Here's some example code:

# for compactness, assume seed size of 4, but it will depend on 
# the implementation (e.g. for my version of gfortran 4.8.5 it is 12)

seed1(1:4) = [ 123456789, 987654321, 456789123, 7891234567 ]
seed2(1:4) =   123456789
seed3(1:4) = [         1,         2,         3,          4 ]

I'd guess that seed1 is fine, but pretty verbose if you're setting it manually (as I am) as seed length can be 12 or 33 or whatever. And I'm not even sure that it's fine because I haven't been able to find any guidelines at all about setting these seeds. I.e. these seeds should be negative for all I know, or 3 digit even numbers, etc. although I guess you'd hope the implementation would warn you about that (?).

seed2 and seed3 are obviously more convenient to set and for all I know are just as good. @Ross suggests that seed2 is in fact fine in his answer here: Random number generator (RNG/PRNG) that returns updated value of seed

So my question in summary is just: How can I correctly set the seed? Are any or all of seed1 to seed3 acceptable?

like image 975
JohnE Avatar asked Aug 17 '18 10:08

JohnE


2 Answers

Guidelines for setting the seed depend on the PRNG algorithm that RANDOM_NUMBER uses, but in general the more "entropy" you provide the better.

If you have a single scalar value, you can use some simple PRNG to expand that to the full seed array required by RANDOM_SEED. See e.g. the function lcg in the example code at https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gfortran/RANDOM_005fSEED.html

Current versions of GFortran have some protections against bad seeds, and it should be relatively immune towards "dumb" seeding (e.g. all values of seed(:) identical, or all values small or even zero), but for portability to other compilers following something like I suggested above might still be a good idea.

like image 176
janneb Avatar answered Sep 30 '22 12:09

janneb


What you provide to random_seed( put=... ) is used to determine the starting state of the generator, which (as janneb states) should have as much entropy as reasonably possible. You could construct some relatively sophisticated method of generating this entropy - grabbing from the system somehow is a good choice for this. The code janneb links is a good example.

However, I typically like to be able to reproduce a single run from a given seed if necessary. This is useful for debugging and regression testing. Then, for production runs, the code can pull a single seed 'randomly' somehow. Therefore, I want to get good RNG from a single 'seed'. In my experience, this is easily achieved by providing this single seed then letting the generator add entropy by generating numbers. Consider the following example:

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer, parameter :: n_discard = 100

   integer :: state_size, i
   integer, allocatable, dimension(:) :: state
   real(wp) :: ran, oldran

   call random_seed( size=state_size )
   write(*,*) '-- state size is: ', state_size

   allocate(state(state_size))

   ! -- Simple method of initializing seed from single scalar
   state = 20180815
   call random_seed( put=state )

   ! -- 'Prime' the generator by pulling the first few numbers
   ! -- In reality, these would be discarded but I will print them for demonstration
   ran = 0.5_wp
   do i=1,n_discard
      oldran = ran
      call random_number(ran)
      write(*,'(a,i3,2es26.18)') 'iter, ran, diff: ', i, ran, ran-oldran
   enddo

   ! Now the RNG is 'ready'
end program main

Here, I give a single seed, and then generate a random number 100 times. Typically, I would discard these initial, potentially corrupted, numbers. In this example, I'm printing them to see whether they look non-random. Running with PGI 15.10:

enet-mach5% pgfortran --version

pgfortran 15.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
enet-mach5% pgfortran main.f90 && ./a.out
 -- state size is:            34
iter, ran, diff:   1  8.114813341476008191E-01  3.114813341476008191E-01
iter, ran, diff:   2  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   3  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   4  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   5  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   6  2.172220012214012286E-01 -5.942593329261995905E-01
iter, ran, diff:   7  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   8  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   9  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  10  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  11  6.229626682952016381E-01  4.057406670738004095E-01
iter, ran, diff:  12  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  13  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  14  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  15  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  16  2.870333536900204763E-02 -5.942593329261995905E-01
iter, ran, diff:  17  2.870333536900204763E-02  0.000000000000000000E+00
iter, ran, diff:  18  4.344440024428024572E-01  4.057406670738004095E-01
iter, ran, diff:  19  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  20  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  21  8.401846695166028667E-01  4.057406670738004095E-01
iter, ran, diff:  22  8.401846695166028667E-01  0.000000000000000000E+00
iter, ran, diff:  23  6.516660036642036857E-01 -1.885186658523991809E-01
iter, ran, diff:  24  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  25  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  26  5.740667073800409526E-02 -5.942593329261995905E-01
iter, ran, diff:  27  5.740667073800409526E-02  0.000000000000000000E+00
iter, ran, diff:  28  2.746286719594053238E-01  2.172220012214012286E-01
iter, ran, diff:  29  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  30  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  31  6.803693390332057334E-01  4.057406670738004095E-01
iter, ran, diff:  32  6.803693390332057334E-01  0.000000000000000000E+00
iter, ran, diff:  33  3.033320073284073715E-01 -3.770373317047983619E-01
iter, ran, diff:  34  3.033320073284073715E-01  0.000000000000000000E+00
iter, ran, diff:  35  7.090726744022077810E-01  4.057406670738004095E-01
iter, ran, diff:  36  1.148133414760081905E-01 -5.942593329261995905E-01
iter, ran, diff:  37  1.148133414760081905E-01  0.000000000000000000E+00
iter, ran, diff:  38  1.435166768450102381E-01  2.870333536900204763E-02
iter, ran, diff:  39  1.435166768450102381E-01  0.000000000000000000E+00
iter, ran, diff:  40  3.607386780664114667E-01  2.172220012214012286E-01
iter, ran, diff:  41  7.664793451402118762E-01  4.057406670738004095E-01
iter, ran, diff:  42  7.664793451402118762E-01  0.000000000000000000E+00
iter, ran, diff:  43  2.009233475830143334E-01 -5.655559975571975428E-01
iter, ran, diff:  44  2.009233475830143334E-01  0.000000000000000000E+00
iter, ran, diff:  45  6.353673500258167905E-01  4.344440024428024572E-01
iter, ran, diff:  46  4.110801709961720007E-02 -5.942593329261995905E-01
iter, ran, diff:  47  4.110801709961720007E-02  0.000000000000000000E+00
iter, ran, diff:  48  8.812926866162200668E-01  8.401846695166028667E-01
iter, ran, diff:  49  8.812926866162200668E-01  0.000000000000000000E+00
iter, ran, diff:  50  9.386993573542241620E-01  5.740667073800409526E-02
iter, ran, diff:  51  3.444400244280245715E-01 -5.942593329261995905E-01
iter, ran, diff:  52  7.501806915018249811E-01  4.057406670738004095E-01
iter, ran, diff:  53  9.961060280922282573E-01  2.459253365904032762E-01
iter, ran, diff:  54  9.961060280922282573E-01  0.000000000000000000E+00
iter, ran, diff:  55  8.221603419923440015E-02 -9.138899938929938571E-01
iter, ran, diff:  56  4.879567012730348097E-01  4.057406670738004095E-01
iter, ran, diff:  57  1.109193695682364478E-01 -3.770373317047983619E-01
iter, ran, diff:  58  7.625853732324401335E-01  6.516660036642036857E-01
iter, ran, diff:  59  7.625853732324401335E-01  0.000000000000000000E+00
iter, ran, diff:  60  2.831393817822487335E-01 -4.794459914501914000E-01
iter, ran, diff:  61  6.888800488560491431E-01  4.057406670738004095E-01
iter, ran, diff:  62  7.462867195940532383E-01  5.740667073800409526E-02
iter, ran, diff:  63  8.036933903320573336E-01  5.740667073800409526E-02
iter, ran, diff:  64  8.036933903320573336E-01  0.000000000000000000E+00
iter, ran, diff:  65  1.644320683984688003E-01 -6.392613219335885333E-01
iter, ran, diff:  66  5.701727354722692098E-01  4.057406670738004095E-01
iter, ran, diff:  67  6.849860769482774003E-01  1.148133414760081905E-01
iter, ran, diff:  68  1.481334147600819051E-01 -5.368526621881954952E-01
iter, ran, diff:  69  5.538740818338823146E-01  4.057406670738004095E-01
iter, ran, diff:  70  1.605380964906970576E-01 -3.933359853431852571E-01
iter, ran, diff:  71  5.662787635644974671E-01  4.057406670738004095E-01
iter, ran, diff:  72  7.672021111475118005E-01  2.009233475830143334E-01
iter, ran, diff:  73  6.360901160331167148E-01 -1.311119951143950857E-01
iter, ran, diff:  74  6.647934514021187624E-01  2.870333536900204763E-02
iter, ran, diff:  75  9.231234697231371911E-01  2.583300183210184287E-01
iter, ran, diff:  76  3.288641367969376006E-01 -5.942593329261995905E-01
iter, ran, diff:  77  5.034149292976053403E-02 -2.785226438671770666E-01
iter, ran, diff:  78  3.249701648891658579E-01  2.746286719594053238E-01
iter, ran, diff:  79  4.110801709961720007E-01  8.611000610700614288E-02
iter, ran, diff:  80  7.268168600551945246E-01  3.157366890590225239E-01
iter, ran, diff:  81  1.325575271289949342E-01 -5.942593329261995905E-01
iter, ran, diff:  82  2.147735613282293343E-01  8.221603419923440015E-02
iter, ran, diff:  83  8.951429003614350677E-01  6.803693390332057334E-01
iter, ran, diff:  84  9.606624794444940107E-02 -7.990766524169856666E-01
iter, ran, diff:  85  8.749502748152764298E-01  7.788840268708270287E-01
iter, ran, diff:  86  6.864316089628772488E-01 -1.885186658523991809E-01
iter, ran, diff:  87  3.753116578189263919E-01 -3.111199511439508569E-01
iter, ran, diff:  88  4.614216639259325348E-01  8.611000610700614288E-02
iter, ran, diff:  89  8.632683590919612016E-01  4.018466951660286668E-01
iter, ran, diff:  90  5.110403908483931446E-01 -3.522279682435680570E-01
iter, ran, diff:  91  3.512250603649960112E-01 -1.598153304833971333E-01
iter, ran, diff:  92  2.984351275420635830E-01 -5.278993282293242828E-02
iter, ran, diff:  93  7.902858007228701354E-01  4.918506731808065524E-01
iter, ran, diff:  94  9.136098520217217356E-01  1.233240512988516002E-01
iter, ran, diff:  95  8.360105557375590024E-01 -7.759929628416273317E-02
iter, ran, diff:  96  7.623052313611680120E-01 -7.370532437639099044E-02
iter, ran, diff:  97  2.525198759725810760E-02 -7.370532437639099044E-01
iter, ran, diff:  98  9.228433278518650695E-01  8.975913402546069619E-01
iter, ran, diff:  99  1.283834133499510699E-01 -7.944599145019139996E-01
iter, ran, diff: 100  7.311534560989940701E-01  6.027700427490430002E-01

8 of the first 10 numbers generated are the repeated! This is a good illustration of why some generators require a high-entropy state in the first place. However, after 'some' time, the numbers start to look reasonable.

For my applications, 100 or so random numbers is a very small cost, so whenever I seed a generator, I prime them in this manner. I didn't observe this obviously bad behavior on ifort 16.0, gfortran 4.8, or gfortran 8.1. Non-repeating numbers is a pretty low bar, though. So I would prime for all compilers, not just ones I have observed bad behavior for.

From the comments, some compilers attempt to eliminate bad behavior by processing the input state in some way to yield the actual internal state. Gfortran uses an "xor cipher". The operation is reversed on a get.

like image 39
Ross Avatar answered Sep 30 '22 12:09

Ross