Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the effective seed range for Ruby's rand?

Ruby implements PRNGs as "a modified Mersenne Twister with a period of 2**19937-1." 1

The way I understand MT is that it operates on 2^32 different seeds. What confuses me is that Random.new(seed) accepts arbitrarily big numbers such as Random.new(2**100).

However, I wasn't able to find (logical) collisions:

Random.new(1).rand(10**5) == Random.new(2**32-1).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32+1).rand(10**5) => false

Given that we'd like to utilize MT's maximum seed range in the sense that we want to use as many different seeds as possible while still avoiding collisions with two different seeds, what seed range achieves this?

I tried understanding what is happening inside the Ruby's random implementation, but didn't get too far. https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370

like image 320
randomguy Avatar asked Nov 19 '13 16:11

randomguy


1 Answers

The Mersenne Twister sequence is 2 ** ( 624 * 32 - 1 ) - 1 long, and the seed value is used to set an internal state for the PRNG that directly relates to the position within that sequence.

The easiest-to-find repeat appears to be every 2 ** ( 624 * 32 ), and can be shown to work like this:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 r17 = Random.new( start_value + 17 * repeat_every )

 r23 = Random.new( start_value + 23 * repeat_every )

 r1.rand == r2.rand  
 # true

 r17.rand == r23.rand  
 # true

Or try this:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 Array.new(10) { r1.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

 Array.new(10) { r2.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

The repeat value relates to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked packs a Ruby Fixnum into an array - the magic command is

  rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0,
        INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER );

however, this isn't something easy to play with, it is defined in internal.h, so only really accessible if you work on Ruby interpreter itself. You cannot access this function from within a normal C extension.

The packed integer is then loaded to the MT's internal state by the function init_by_array. This is quite a complex-looking function - the packed seed value is not written literally into the state, but instead the state is generated item by item, adding in the supplied array values, using a variety of xors, additions and cross-referencing the previous value (the Ruby source here also adds in the packed array's index position, commented "non-linear", I think that is one of the referenced modifications to standard MT)

Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 ) - the repeat_every value I show here is skipping over 2 sequences at a time, but it is the easiest to find repeating seed value, because it is easy to see how it sets the internal state exactly the same (because the first 624 items in the array representation of the seed are identical, and that is all that gets used later). Other seed values will also produce the same internal state, but the relationship is a complex mapping that pairs up each integer in the 19938-bit space with another integer which creates the same state for MT.

like image 149
Neil Slater Avatar answered Sep 23 '22 10:09

Neil Slater