Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly does random.random() work in python?

Tags:

python

random

I am a bit confused about how the random.random() function works in python.

The docs say that it 'Return the next random floating point number in the range [0.0, 1.0)'. I understand that pseudo-random number generators work by performing some operation on a value. Generally this value is the previous number generated by the generator. So I think that's what 'next random floating point' means here. (Please correct me if I am wrong)

But when I saw the source code of the random library, random function is not defined in the class Random. Instead, its defined in the class SystemRandom as follows (line 671 of the code):

 def random(self):
        """Get the next random number in the range [0.0, 1.0)."""
        return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF

If I understand this correctly, this function generates a random number using os.urandom. Which, according to the documentation, returns random bytes from an OS-specific randomness source. So this will not give the 'next' floating point random number.

How are the two connected? or are they two different things?

I am quite confused here. Any kind of help would be appreciated.

Thanks!

like image 608
user324 Avatar asked Feb 02 '17 09:02

user324


People also ask

How does Python random library work?

The random module in python contains two interfaces(classes) of pseudorandom number generators(PRNGs). You can view it as two ways to generate random numbers. SystemRandom uses either the /dev/urandom file on POSIX systems or the CryptGenRandom() function on Windows NT systems. Both are Cryptographically secure PRNGs.

How does the random function work?

Random number generators use mathematical formulas that transfer set of numbers to another one. If, for example, you take a constant number N and another number n_0 , and then take the value of n mod N (the modulo operator), you will get a new number n_1 , which looks as it if is unrelated to n_0 .

How does Python generate random data?

Use a random. rand(d0, d1, …, dn) function to generate an n-dimensional array of random float numbers in the range of [0.0, 1.0) . Use a random. uniform(low=0.0, high=1.0, size=None) function to generate an n-dimensional array of random float numbers in the range of [low, high) .


2 Answers

The random module in python contains two interfaces(classes) of pseudorandom number generators(PRNGs). You can view it as two ways to generate random numbers.

  • Random uses the Mersenne Twister PRNG. It is not cryptographically secure
  • SystemRandom uses either the /dev/urandom file on POSIX systems or the CryptGenRandom() function on Windows NT systems. Both are Cryptographically secure PRNGs.

A note on the module secrets.

The module secrets does not implement any type of PRNG but provides helper functions(which is awesome because we don't have to write them ourselves) based on SystemRandom and os.urandom(which SystemRandom is based upon). The comments are mine:

from random import SystemRandom
_sysrand = SystemRandom() #secrets._sysrand
randbits = _sysrand.getrandbits #secrets.randbits
choice = _sysrand.choice #secrets.choice

def randbelow(exclusive_upper_bound): #secrets.randbelow
    ...
    return _sysrand._randbelow(exclusive_upper_bound) #uses SystemRandom

def token_bytes(nbytes=None): #secrets.token_bytes
    ...
    return os.urandom(nbytes)

def token_hex(nbytes=None): #secrets.token_hex(uses token_bytes())
    ...
    return binascii.hexlify(token_bytes(nbytes)).decode('ascii')

def token_urlsafe(nbytes=None): # secrets.token_urlsafe(uses token_bytes())
    ...
    tok = token_bytes(nbytes)
    return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')

How does Random.random() work?

random.random() is defined in the 'random.py' module on line 749(for me)

_inst = Random()
...
random = _inst.random

The class random.Random() does not define the random() method per se but inherits _random.Random()(which does define a method called random()), which is a class called Random() located at the module _random.

The C source code of the _random(it is a built-in module) module can be found here(it is actually called _randommodule.c. See explanation below)

Naming convention for python modules written in C/C++

(Historically, if a module is called spam, the C file containing its implementation is called spammodule.c; if the module name is very long, like spammify, the module name can be just spammify.c.)

The _random.Random.random()(or random.random()) method is defined as _random_Random_random_impl() in the _randommodule.c file.

static PyObject *
_random_Random_random_impl(RandomObject *self)
{
    uint32_t a=genrand_int32(self)>>5, b=genrand_int32(self)>>6;
    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}

genrand_int32() is a function defined by the Mersenne Twister PRNG implementation which returns a 4-byte number.

How does SystemRandom().random() work?

(I know you didn't ask for SystemRandom(), but at the time I wrote this I hadn't realized)

I've made this image as an overview of my answer(However, I do encourage you to read it all)

Overview of SystemRandom's random() method

SystemRandom().random() is defined in the module random.py.

 ...
 def random(self):
    """Get the next random number in the range [0.0, 1.0)."""
    return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF**strong text**

The function uses another function called urandom() defined in the module os.py

from os import urandom as _urandom

The os.py module does not define the function urandom() itself but imports it from a built-in module. os.py will import the posix built-in module if you are on a POSIX OS or the nt built-in module if you are on a Windows NT OS. These modules contain the definition for urandom().

if 'posix' in _names:
    name = 'posix'
    linesep = '\n'
    from posix import *

OR

elif 'nt' in _names:
    name = 'nt'
    linesep = '\r\n'
    from nt import *

posix and nt are built-in modules, so they don't have the __file__ attribute.

Diving in the source code:

POSIX

  • urandom() is defined in the posixmodule.c as os_urandom_impl() which calls _PyOS_URandom().
static PyObject *
os_urandom_impl(PyObject *module, Py_ssize_t size)
{
  ...
  bytes = PyBytes_FromStringAndSize(NULL, size);
  ...
  result = _PyOS_URandom(PyBytes_AS_STRING(bytes), PyBytes_GET_SIZE(bytes));
  ...
  return bytes
}
  • _PyOS_URandom() is defined in the bootstrap_hash.c file which then calls pyurandom()
int
_PyOS_URandom(void *buffer, Py_ssize_t size)
{
    return pyurandom(buffer, size, 1, 1);
}
  • pyurandom() is defined in the bootstrap_hash.c file which then calls dev_urandom().
static int
pyurandom(void *buffer, Py_ssize_t size, int blocking, int raise)
{
  ...
  return dev_urandom(buffer, size, raise);
  ...
}
  • dev_urandom is defined in the bootstrap_hash.c file which then uses the /dev/urandom directory to get random bytes.
static int
dev_urandom(char *buffer, Py_ssize_t size, int raise)
{
  ...
  fd = _Py_open("/dev/urandom", O_RDONLY);
  ...
  do {
    n = _Py_read(fd, buffer, (size_t)size);
    ...
  } while (0 < size);
  ...
}

Windows NT

It may look a bit weird(I thought that as well) but the posixmodule.c file is also used for the NT systems, here is a quote(comment) from the beginning of the file

This file is also used for Windows NT/MS-Win. In that case the
module actually calls itself 'nt', not 'posix', and a few functions are either unimplemented or implemented differently. The source
assumes that for Windows NT, the macro 'MS_WINDOWS' is defined independent of the compiler used. Different compilers define their own feature test macro, e.g. '_MSC_VER'.

For Windows NT the function call chain is the same as for POSIX until the pyurandom() function

  • pyurandom() is defined in the bootstrap_hash.c file which then calls win32_urandom().
static int
pyurandom(void *buffer, Py_ssize_t size, int blocking, int raise)
{
  ...
  #ifdef MS_WINDOWS
      return win32_urandom((unsigned char *)buffer, size, raise);
  #else
  ...
}
  • win32_urandom() is defined in the bootstrap_hash.c file which then calls CryptGenRandom().
static int
win32_urandom(unsigned char *buffer, Py_ssize_t size, int raise)
{
  ...
  if (!CryptGenRandom(hCryptProv, chunk, buffer))
  {
  ...
  }
  ...
  return 0;
}
  • CryptGenRandom() is declared in the wincrypt.h file and defined in the Advapi32.lib and Advapi32.dll libraries(These files are provided by Microsoft)
like image 127
Blastosphere Avatar answered Oct 16 '22 17:10

Blastosphere


random.random() is actually defined here:

random = _inst.random

However, it is just a reference to C implementation.

Here is a quote from the source:

General notes on the underlying Mersenne Twister core generator:

  • The period is 2**19937-1.
  • It is one of the most extensively tested generators in existence.
  • The random() method is implemented in C, executes in a single Python step, and is, therefore, threadsafe.

You probably want to look at the article on Mersenne Twister. To speak briefly, the state of the generator is not the same as "previous number", it is much more complicated thing. So you are wrong in «…pseudo-random number generators work by performing some operation on a value. Generally this value is the previous number generated by the generator».

As for SystemRandom.random(), it is in a way unrelated to random.random(). It is possible in Python that function with the same name imported from different modules are different, so you can't rely on function's name here.

like image 5
Ilya V. Schurov Avatar answered Oct 16 '22 18:10

Ilya V. Schurov