Here are codes and result: <pre class="prettyprint"><code>python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4])', number=1000000, globals=globals())))" </code></pre> <pre class="prettyprint"><code>numpy version 1.16.6: 1.5 seconds # 10x faster numpy version 1.18.1: 15.5 seconds numpy version 1.19.0: 17.4 seconds numpy version 1.21.4: 15.1 seconds </code></pre> It is noted that with fixed random seed, the output are the same with different numpy version <pre class="prettyprint"><code>python -c "import numpy as np; np.random.seed(0); print(np.__version__); print(np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4], size=10000))" /tmp/tt </code></pre> Any advice on why numpy version after 1.16.6 is 10x slower? We have upgraded pandas to latest version 1.3.4, which needs numpy version after 1.16.6

TL;DR: this is a local performance regression caused by the overhead of additional checks in the <code>numpy.random.multinomial</code> function. Very small arrays are strongly impacted due to the relative execution time of the required checks. <hr> <h3>Under the hood</h3> A binary search on the Git commits of the Numpy code shows that the performance regression appear the first time in mid April 2019. It can be reproduced in the commit <code>dd77ce3cb</code> but not <code>7e8e19f9a</code>. There are some build issues for the commit in-between, but with some quick fix we can show that the commit <code>0f3dd0650</code> is the first to cause the issue. The commit says that it: <blockquote> Extend multinomial to allow broadcasting Fix zipf changes missed in NumPy Enable 0 as valid input for hypergeometric </blockquote> A deeper analysis of this commit shows that it modifies the <code>multinomial</code> function defined in Cython file <code>mtrand.pyx</code> to perform two additional following checks: <pre class="prettyprint lang-py prettyprint-override"><code>def multinomial(self, np.npy_intp n, object pvals, size=None): cdef np.npy_intp d, i, sz, offset cdef np.ndarray parr, mnarr cdef double *pix cdef int64_t *mnix cdef int64_t ni d = len(pvals) parr = <np.ndarray>np.PyArray_FROM_OTF(pvals, np.NPY_DOUBLE, np.NPY_ALIGNED) pix = <double*>np.PyArray_DATA(parr) check_array_constraint(parr, 'pvals', CONS_BOUNDED_0_1) # <==========[HERE] if kahan_sum(pix, d-1) > (1.0 + 1e-12): raise ValueError("sum(pvals[:-1]) > 1.0") if size is None: shape = (d,) else: try: shape = (operator.index(size), d) except: shape = tuple(size) + (d,) multin = np.zeros(shape, dtype=np.int64) mnarr = <np.ndarray>multin mnix = <int64_t*>np.PyArray_DATA(mnarr) sz = np.PyArray_SIZE(mnarr) ni = n check_constraint(ni, 'n', CONS_NON_NEGATIVE) # <==========[HERE] offset = 0 with self.lock, nogil: for i in range(sz // d): random_multinomial(self._brng, ni, &mnix[offset], pix, d, self._binomial) offset += d return multin </code></pre> These two checks are required for the code to be robust. However, they are currently pretty expensive considering their purpose. Indeed, on my machine, the first check is responsible for ~75% of the overhead and the second for ~20%. The checks takes few micro-seconds but since your input is very small, the overhead is huge compared to the computation time. <hr> One workaround to fix this issue is to write a specific Numba function for this since your input array is very small. On my machine, <code>np.random.multinomial</code> in a trivial Numba function results in good performance.

numpy.random.multinomial at version 1.16.6 is 10x faster than later version

Tags:

performance

numpy

multinomial

Here are codes and result:

python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4])', number=1000000, globals=globals())))"

numpy version 1.16.6:  1.5 seconds # 10x faster
numpy version 1.18.1: 15.5 seconds
numpy version 1.19.0: 17.4 seconds
numpy version 1.21.4: 15.1 seconds

It is noted that with fixed random seed, the output are the same with different numpy version

python -c "import numpy as np; np.random.seed(0); print(np.__version__); print(np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4], size=10000))" /tmp/tt

Any advice on why numpy version after 1.16.6 is 10x slower?

We have upgraded pandas to latest version 1.3.4, which needs numpy version after 1.16.6

534

asked Dec 20 '21 22:12

Winston Guo

2 Answers

TL;DR: this is a local performance regression caused by the overhead of additional checks in the numpy.random.multinomial function. Very small arrays are strongly impacted due to the relative execution time of the required checks.

Under the hood

A binary search on the Git commits of the Numpy code shows that the performance regression appear the first time in mid April 2019. It can be reproduced in the commit dd77ce3cb but not 7e8e19f9a. There are some build issues for the commit in-between, but with some quick fix we can show that the commit 0f3dd0650 is the first to cause the issue. The commit says that it:

Extend multinomial to allow broadcasting
Fix zipf changes missed in NumPy
Enable 0 as valid input for hypergeometric

A deeper analysis of this commit shows that it modifies the multinomial function defined in Cython file mtrand.pyx to perform two additional following checks:

def multinomial(self, np.npy_intp n, object pvals, size=None):
    cdef np.npy_intp d, i, sz, offset
    cdef np.ndarray parr, mnarr
    cdef double *pix
    cdef int64_t *mnix
    cdef int64_t ni

    d = len(pvals)
    parr = <np.ndarray>np.PyArray_FROM_OTF(pvals, np.NPY_DOUBLE, np.NPY_ALIGNED)
    pix = <double*>np.PyArray_DATA(parr)
    check_array_constraint(parr, 'pvals', CONS_BOUNDED_0_1)   # <==========[HERE]
    if kahan_sum(pix, d-1) > (1.0 + 1e-12):
        raise ValueError("sum(pvals[:-1]) > 1.0")

    if size is None:
        shape = (d,)
    else:
        try:
            shape = (operator.index(size), d)
        except:
            shape = tuple(size) + (d,)

    multin = np.zeros(shape, dtype=np.int64)
    mnarr = <np.ndarray>multin
    mnix = <int64_t*>np.PyArray_DATA(mnarr)
    sz = np.PyArray_SIZE(mnarr)
    ni = n
    check_constraint(ni, 'n', CONS_NON_NEGATIVE)              # <==========[HERE]
    offset = 0
    with self.lock, nogil:
        for i in range(sz // d):
            random_multinomial(self._brng, ni, &mnix[offset], pix, d, self._binomial)
            offset += d

    return multin

These two checks are required for the code to be robust. However, they are currently pretty expensive considering their purpose.

Indeed, on my machine, the first check is responsible for ~75% of the overhead and the second for ~20%. The checks takes few micro-seconds but since your input is very small, the overhead is huge compared to the computation time.

One workaround to fix this issue is to write a specific Numba function for this since your input array is very small. On my machine, np.random.multinomial in a trivial Numba function results in good performance.

171

answered Nov 10 '22 00:11

Jérôme Richard

I checked some generators that are under the hood and saw no much change in the timings.

I guessed difference may be due to some overhead, because you are sampling only single value. And it seems to be good hypothesis. When I increased size of the generated random samples to 1000, difference between 1.16.6 and 1.19.2 (my current Numpy version) diminished to ~20%.

python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.
multinomial(1, [0.1, 0.2, 0.3, 0.4], size=1000)', number=10000, globals=globals())))"

numpy version 1.16.6: 1.1 seconds
numpy version 1.19.2: 1.3 seconds

Note that both versions have this overhead, just newer version has it much larger. In both versions it is much faster to sample 1000 values once than sample 1 value 1000 times.

They changed by much the code between 1.16.6 and 1.17.0, see for example this commit, it's hard to analyse. Sorry that can't help you better - I propose to make an issue on Numpy's github.

answered Nov 10 '22 00:11

dankal444

Related questions
                            
                                implementing RNN with numpy
                            
                                Use cases of `numpy.positive`
                            
                                Why does math.infinity turn to a big negative integer when overwriting in some cases?
                            
                                Where can I find a list of numpy functions which release the GIL?
                            
                                When subclassing a numpy ndarray, how can I modify __getitem__ properly?
                            
                                Binding C array to Numpy array without copying
                            
                                2D numpy array does not give an error when indexing with strings containing digits
                            
                                Proper way to cast numpy.matrix to C double pointer
                            
                                Speed up structured NumPy array
                            
                                Difference between numpy ediff1d and diff
                            
                                TensorFlow tf.reshape Fortran order (like numpy)
                            
                                Python: Intersection of spheres
                            
                                C array to PyArray
                            
                                Finding the roots of a large number of functions with one variable
                            
                                A-star search in numpy or python
                            
                                Installing numpy for Python 2.7 while also having Python 3.4 installed?
                            
                                Adding data to Pandas Dataframe from a CSV file causing Value Errors
                            
                                Extracting one-hot vector from text
                            
                                Performance between C-contiguous and Fortran-contiguous array operations
                            
                                when to use numpy vs statistics modules

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With