Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does bin(hash(random.random())) always end with 8 zero digits?

When I hash a float generated by random.random(), I always get a hash value that ends with 8 zero digits when represented in binary format. You can test this in the following ways:

import random
print(bin(hash(random.random()))) # always returns an int in binary ending with 8 zeros.
# or
print(hash(random.random())%256) # always returns 0

For example:

>>> random.random()
0.014759690362504796
>>> bin(hash(0.014759690362504796)) 
'0b1111000111010010101000001101100111111100101000100000000'
>>> bin(hash(0.014759690362504797)) # replacing the last 6 with a 7 doesn't produce 8 zeros at the end
'0b1111000111010010101000001101100111111100101000100000100'

I understand that this means, that random.random() doesn't generate all possible random values between 0 and 1 and therefore reduces the space of possible random values.

But why do numbers generated by random.random() always have a hash value divisible by 256?

like image 616
redstoner2014 Avatar asked Oct 30 '25 04:10

redstoner2014


2 Answers

Turns out it is not directly connected to Mersenne Twister algorithm, only to the fact that it has 32-bit output that is used to build float (double) value in specific way. The problem is in this six lines of random.random() implementation:

static PyObject *
_random_Random_random_impl(RandomObject *self)
/*[clinic end generated code: output=117ff99ee53d755c input=afb2a59cbbb00349]*/
{
    uint32_t a=genrand_uint32(self)>>5, b=genrand_uint32(self)>>6;
    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}

Regardless of genrand_uint32 implementation this function return double value with hash divisable by 256.

For example if we replace genrand_uint32 with secure random generator nothing will changed.

#include <QDebug>
#include <math.h>
#include <QRandomGenerator>

typedef __int64 Py_ssize_t;

typedef Py_ssize_t Py_hash_t;

#define SIZEOF_VOID_P 8

#if SIZEOF_VOID_P >= 8
#  define _PyHASH_BITS 61
#else
#  define _PyHASH_BITS 31
#endif

#define _PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)

class PyObject{};

class RandomObject {
public:
    uint32_t generate() {
        return QRandomGenerator::system()->generate();
    }
};

typedef size_t Py_uhash_t;

// cpython-main\Python\pyhash.c
Py_hash_t
_Py_HashDouble(PyObject *inst, double v)
{
    int e, sign;
    double m;
    Py_uhash_t x, y;

    m = frexp(v, &e);

    sign = 1;
    if (m < 0) {
        sign = -1;
        m = -m;
    }

    /* process 28 bits at a time;  this should work well both for binary
       and hexadecimal floating point. */
    x = 0;
    while (m) {
        x = ((x << 28) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - 28);
        m *= 268435456.0;  /* 2**28 */
        e -= 28;
        y = (Py_uhash_t)m;  /* pull out integer part */
        m -= y;
        x += y;
        if (x >= _PyHASH_MODULUS)
            x -= _PyHASH_MODULUS;
    }

    /* adjust for the exponent;  first reduce it modulo _PyHASH_BITS */
    e = e >= 0 ? e % _PyHASH_BITS : _PyHASH_BITS-1-((-1-e) % _PyHASH_BITS);
    x = ((x << e) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - e);

    x = x * sign;
    if (x == (Py_uhash_t)-1)
        x = (Py_uhash_t)-2;
    return (Py_hash_t)x;
}

static uint32_t genrand_uint32(RandomObject *self) {
    return self->generate();
}

// cpython-main\Modules\clinic\_randommodule.c.h
static double _random_Random_random_impl(RandomObject *self)
{
    uint32_t a=genrand_uint32(self)>>5, b=genrand_uint32(self)>>6;
    return (a*67108864.0+b)*(1.0/9007199254740992.0);
}

int main(int argc, char *argv[])
{
    RandomObject *randomGenerator = new RandomObject();
    for(int i=0;i<100;i++) {
        double v = _random_Random_random_impl(randomGenerator);
        Py_hash_t h = _Py_HashDouble(nullptr, v);
        qDebug() << "value" << v << "hash" << h << "h % 256" << (h % 256);
    }
    return 0;
}
like image 151
mugiseyebrows Avatar answered Nov 01 '25 19:11

mugiseyebrows


The observed behaviour results from the combination of two things: Properties of the random numbers to be generated, and the way Python hashes floats.

a. It is correct and desired that random.random() does not generate all possible, i.e. representable, values of data type float. In fact, it practically will produce a small subset only, in order to deliver a uniform distribution.

Technically, a random generator may produce a fix point value with 53 binary digits (and just convert them to the float format without loss of information). The resulting 2^53 numbers are spread out uniformly In the interval [0, 1[. Select each of them with the same likelihood and you'll have uniformly distributed random numbers (see also https://github.com/python/cpython/blob/9d38120e335357a3b294277fd5eff0a10e46e043/Lib/random.py#L822).

The float format in turn can represent many more different values, in particular within the interval [0, 2^(-53)[. This is a very small portion of [0, 1[ only, but most representable float values within [0, 1[ will be found within this small interval, e.g. all the values with any mantissa and exponents between -54 and -1023. (In theory, a generator of uniformly distributed numbers could be allowed to produce any of these small values as well, but with correspondingly small likelihood and you just may never observe any of these.)

b. Python's hashing algorithm for floats maps integer and fraction part bits to an 64 bit int value, virtually by using shift and or operations. For random numbers based exclusively on digits with positional values 2^(-1) to 2^(-53) this results in hash values using exclusively bits number 8 to 60. Bits 0 to 7 and above 60 always remain 0.

The following program may just serve as an illustration which produces hash values identical to the built-in hash function:

def my_hash(x: float) -> int:
    num_bits = 61
    neg, x = x < 0, abs(x)
    # separate integer part (h) and fraction part (f)
    h = int(x)
    f = x - h
    # fold integer part to bits no. 0...num_bits-1, aligning 2^0 to bit 0.
    while h >= 2 ** num_bits:
        h = (h & (2 ** num_bits - 1)) ^ (h >> num_bits)
    # fold fraction part to bits no. 0...num_bits-1, aligning 2^-1 to bit num_bits-1.
    while f != 0:
        f *= 2 ** num_bits
        h ^= int(f)
        f -= int(f)
    return h if not neg else -h

Note that numbers outside the fix point scope described in a. above will well lead to non-zero digits bellow bit 8 as well.

like image 30
wweck Avatar answered Nov 01 '25 17:11

wweck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!