Make PRNGs Agree Across Software

Tags:

I am investigating whether it is possible to have two sets of software agree on a sequence of produced pseudo-random numbers. I am as interested in understanding all the possible points of divergence as I am in actually finding a way to get them to agree.

Why? I work in a data shop that uses many different software packages (Stata, R, Python, SAS, probably others). There has recently been interest in QCing outputs by replicating processes in another language. For any process that involves random numbers, it would be helpful if we could provide a series of steps ("set this option", etc.) that allow the two packages to agree. If that's not feasible, I'd like to be able to articulate where are the failure points.

A simple example:

Both R and Python's default random number generator is Mersenne-Twister. I set them to the same seed and try to sample from and also look at the "state" of the PRNG. Neither value agrees.

R (3.2.3, 64-bit):

set.seed(20160201)
.Random.seed
sample(c(1, 2, 3, 4, 5))

Python (3.5.1, 64-bit):

import random

random.seed(20160201)
random.getstate()
random.sample([1, 2, 3, 4, 5], 5)

739

asked Feb 01 '16 20:02

ElizabethAB

1 Answers

Old question, but maybe useful to some future reader: As alluded in the comments, your best bet is to implement this your self and provide interfaces for the different environments such that for a given seed the same results are returned. Why is that necessary? You used "sampling" as an example. There are several steps involved.

Seeding is a non-trivial process. For example R goes as far as to further scramble the provided seed. So unless you user tools use the same method, they will end up with a different seed even when the user supplies the same value.
The actual RNG: Even though in both cases Mersenne-Twister might be used, is it really the same version that is used? R uses a 32bit MT. Maybe Python uses a 64bit version?
Most RNGs give you an unsigned integer (nowadays typically 32 or 64bits). But you will need some distribution of random numbers, e.g. for sampling you need random integers within a given range. There are many methods to go from the integers produced by the RNG to those needed for sampling. In the case of R, you do not even have access to the output value of the RNG. The most fundamental function is R_unif which returns a double in [0, 1). Again, how to generate such a double is not universally agreed on. And if you need other distribution functions (normal, exponential, ...) you will find quite a few different algorithms for them.

Overall there are to many places where (subtle) differences can creep in.

121

answered Oct 22 '22 02:10

Ralf Stubner

Related questions
                            
                                Python Pandas to_clipboard() UnicodeEncodeError: 'ascii' codec can't encode character
                            
                                What does "the choice must be consistent for all consumers" mean?
                            
                                mlock a variable in python [duplicate]
                            
                                must build Spark with Hive (spark 1.5.0)
                            
                                Choosing a framework for larger than memory data analysis with python
                            
                                Converting to Jython a Python 3.5 project - UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 4-10: illegal Unicode character
                            
                                RabbitMQ heartbeat vs connection drain events timeout
                            
                                Drop frame while debugging?
                            
                                Python login into a website with javascript form
                            
                                Hashing tuple in Python causing different results in different systems
                            
                                Python: downsample 2D numpy array by a non-integer factor
                            
                                PyMySQL in Flask/Apache sometimes returning empty result
                            
                                Is it Pythonic to have "header" comments in a Python script
                            
                                django-rest-swagger returns with Internal Server Error
                            
                                Running python from a conda environment on windows
                            
                                Why is the main thread of my multithreaded application unresponsive to Ctrl+C?
                            
                                pandas Styler. How to ignore the index column from the rendered HTML
                            
                                Pandas DataFrame.to_sql() error - not all arguments converted during string formatting
                            
                                Retrieving the request object in a custom Django logging handler
                            
                                Implementing CSRF protection in a Python REST API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Make PRNGs Agree Across Software

Tags:

python

random

r

ElizabethAB

People also ask

1 Answers

Ralf Stubner

Recent Activity

Donate For Us