Ruby implements PRNGs as "a modified Mersenne Twister with a period of 2**19937-1." 1
The way I understand MT is that it operates on 2^32 different seeds. What confuses me is that Random.new(seed)
accepts arbitrarily big numbers such as Random.new(2**100)
.
However, I wasn't able to find (logical) collisions:
Random.new(1).rand(10**5) == Random.new(2**32-1).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32+1).rand(10**5) => false
Given that we'd like to utilize MT's maximum seed range in the sense that we want to use as many different seeds as possible while still avoiding collisions with two different seeds, what seed range achieves this?
I tried understanding what is happening inside the Ruby's random implementation, but didn't get too far. https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370
The Mersenne Twister sequence is 2 ** ( 624 * 32 - 1 ) - 1
long, and the seed value is used to set an internal state for the PRNG that directly relates to the position within that sequence.
The easiest-to-find repeat appears to be every 2 ** ( 624 * 32 )
, and can be shown to work like this:
repeat_every = 2 ** ( 624 * 32 )
start_value = 5024214421 # Try any value
r1 = Random.new( start_value )
r2 = Random.new( start_value + repeat_every )
r17 = Random.new( start_value + 17 * repeat_every )
r23 = Random.new( start_value + 23 * repeat_every )
r1.rand == r2.rand
# true
r17.rand == r23.rand
# true
Or try this:
repeat_every = 2 ** ( 624 * 32 )
start_value = 5024214421 # Try any value
r1 = Random.new( start_value )
r2 = Random.new( start_value + repeat_every )
Array.new(10) { r1.rand(100) }
# => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]
Array.new(10) { r2.rand(100) }
# => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]
The repeat value relates to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked packs a Ruby Fixnum into an array - the magic command is
rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0,
INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER );
however, this isn't something easy to play with, it is defined in internal.h
, so only really accessible if you work on Ruby interpreter itself. You cannot access this function from within a normal C extension.
The packed integer is then loaded to the MT's internal state by the function init_by_array
. This is quite a complex-looking function - the packed seed value is not written literally into the state, but instead the state is generated item by item, adding in the supplied array values, using a variety of xors, additions and cross-referencing the previous value (the Ruby source here also adds in the packed array's index position, commented "non-linear", I think that is one of the referenced modifications to standard MT)
Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 )
- the repeat_every
value I show here is skipping over 2 sequences at a time, but it is the easiest to find repeating seed value, because it is easy to see how it sets the internal state exactly the same (because the first 624 items in the array representation of the seed are identical, and that is all that gets used later). Other seed values will also produce the same internal state, but the relationship is a complex mapping that pairs up each integer in the 19938-bit space with another integer which creates the same state for MT.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With