Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.random.multinomial at version 1.16.6 is 10x faster than later version

Here are codes and result:

python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4])', number=1000000, globals=globals())))"
numpy version 1.16.6:  1.5 seconds # 10x faster
numpy version 1.18.1: 15.5 seconds
numpy version 1.19.0: 17.4 seconds
numpy version 1.21.4: 15.1 seconds

It is noted that with fixed random seed, the output are the same with different numpy version

python -c "import numpy as np; np.random.seed(0); print(np.__version__); print(np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4], size=10000))" /tmp/tt

Any advice on why numpy version after 1.16.6 is 10x slower?

We have upgraded pandas to latest version 1.3.4, which needs numpy version after 1.16.6

like image 534
Winston Guo Avatar asked Dec 20 '21 22:12

Winston Guo


People also ask

Is NumPy random faster?

np. sin is slower than math. sin when processing one value, but faster when working with a large array. It's likely the same applies to random .

Does random seed affect NumPy?

seed(number) sets what NumPy calls the global random seed — affecting all uses to the np.

Is NumPy random truly random?

Generating random numbers with numpy In both ways, we are using what we call a pseudo random number generator or PRNG. Indeed, whenever we call a python function, such as np. random. rand() the output can only be deterministic and cannot be truly random.

What is multinomial Python?

The multinomial distribution is a multivariate generalization of the binomial distribution. Take an experiment with one of p possible outcomes. An example of such an experiment is throwing a dice, where the outcome can be 1 through 6. Each sample drawn from the distribution represents n such experiments.


2 Answers

TL;DR: this is a local performance regression caused by the overhead of additional checks in the numpy.random.multinomial function. Very small arrays are strongly impacted due to the relative execution time of the required checks.


Under the hood

A binary search on the Git commits of the Numpy code shows that the performance regression appear the first time in mid April 2019. It can be reproduced in the commit dd77ce3cb but not 7e8e19f9a. There are some build issues for the commit in-between, but with some quick fix we can show that the commit 0f3dd0650 is the first to cause the issue. The commit says that it:

Extend multinomial to allow broadcasting
Fix zipf changes missed in NumPy
Enable 0 as valid input for hypergeometric

A deeper analysis of this commit shows that it modifies the multinomial function defined in Cython file mtrand.pyx to perform two additional following checks:

def multinomial(self, np.npy_intp n, object pvals, size=None):
    cdef np.npy_intp d, i, sz, offset
    cdef np.ndarray parr, mnarr
    cdef double *pix
    cdef int64_t *mnix
    cdef int64_t ni

    d = len(pvals)
    parr = <np.ndarray>np.PyArray_FROM_OTF(pvals, np.NPY_DOUBLE, np.NPY_ALIGNED)
    pix = <double*>np.PyArray_DATA(parr)
    check_array_constraint(parr, 'pvals', CONS_BOUNDED_0_1)   # <==========[HERE]
    if kahan_sum(pix, d-1) > (1.0 + 1e-12):
        raise ValueError("sum(pvals[:-1]) > 1.0")

    if size is None:
        shape = (d,)
    else:
        try:
            shape = (operator.index(size), d)
        except:
            shape = tuple(size) + (d,)

    multin = np.zeros(shape, dtype=np.int64)
    mnarr = <np.ndarray>multin
    mnix = <int64_t*>np.PyArray_DATA(mnarr)
    sz = np.PyArray_SIZE(mnarr)
    ni = n
    check_constraint(ni, 'n', CONS_NON_NEGATIVE)              # <==========[HERE]
    offset = 0
    with self.lock, nogil:
        for i in range(sz // d):
            random_multinomial(self._brng, ni, &mnix[offset], pix, d, self._binomial)
            offset += d

    return multin

These two checks are required for the code to be robust. However, they are currently pretty expensive considering their purpose.

Indeed, on my machine, the first check is responsible for ~75% of the overhead and the second for ~20%. The checks takes few micro-seconds but since your input is very small, the overhead is huge compared to the computation time.


One workaround to fix this issue is to write a specific Numba function for this since your input array is very small. On my machine, np.random.multinomial in a trivial Numba function results in good performance.

like image 171
Jérôme Richard Avatar answered Nov 10 '22 00:11

Jérôme Richard


I checked some generators that are under the hood and saw no much change in the timings.

I guessed difference may be due to some overhead, because you are sampling only single value. And it seems to be good hypothesis. When I increased size of the generated random samples to 1000, difference between 1.16.6 and 1.19.2 (my current Numpy version) diminished to ~20%.

python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.
multinomial(1, [0.1, 0.2, 0.3, 0.4], size=1000)', number=10000, globals=globals())))"
numpy version 1.16.6: 1.1 seconds
numpy version 1.19.2: 1.3 seconds

Note that both versions have this overhead, just newer version has it much larger. In both versions it is much faster to sample 1000 values once than sample 1 value 1000 times.

They changed by much the code between 1.16.6 and 1.17.0, see for example this commit, it's hard to analyse. Sorry that can't help you better - I propose to make an issue on Numpy's github.

like image 45
dankal444 Avatar answered Nov 10 '22 00:11

dankal444