I made a few experiment and found a number of cases where python's standard <code>random</code> and <code>math</code> library is faster than <code>numpy</code> counterpart. I think there is a tendency that python's standard library is about 10x faster for small scale operation, while <code>numpy</code> is much faster for large scale (vector) operations. My guess is that <code>numpy</code> has some overhead which becomes dominant for small cases. My question is: Is my intuition correct? And will it be in general advisable to use the standard library rather than <code>numpy</code> for small (typically scalar) operations? Examples are below. <pre class="prettyprint"><code>import math import random import numpy as np </code></pre> Log and exponential <pre class="prettyprint"><code>%timeit math.log(10) # 158 ns ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %timeit np.log(10) # 1.64 µs ± 93.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit math.exp(3) # 146 ns ± 8.57 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %timeit np.exp(3) # 1.72 µs ± 78.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> Generate normal distribution <pre class="prettyprint"><code>%timeit random.gauss(0, 1) # 809 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit np.random.normal() # 2.57 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre> Choosing a random element <pre class="prettyprint"><code>%timeit random.choices([1,2,3], k=1) # 1.56 µs ± 55.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit np.random.choice([1,2,3], size=1) # 23.1 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) </code></pre> Same with numpy array <pre class="prettyprint"><code>arr = np.array([1,2,3]) %timeit random.choices(arr, k=1) # 1.72 µs ± 33.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit np.random.choice(arr, size=1) # 18.4 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) </code></pre> With big array <pre class="prettyprint"><code>arr = np.arange(10000) %timeit random.choices(arr, k=1000) # 401 µs ± 6.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit np.random.choice(arr, size=1000) # 41.7 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) </code></pre>

NumPy is used primarily for performance with arrays. This relies on the use of contiguous memory blocks and more efficient lower-level iteration. Applying a NumPy mathematical function on a scalar or calculating a random number are not vectorisable operations. This explains the behaviour you are seeing. See also What are the advantages of NumPy over regular Python lists? <blockquote> And will it be in general advisable to use the standard library rather than NumPy for small (typically scalar) operations? </blockquote> It's rare that the bottleneck for a program is caused by operations on scalars. In practice, the differences are negligible. So either way is fine. If you are already using NumPy there's no harm in continuing to use NumPy operations on scalars. It's worth making a special case of calculating random numbers. As you might expect, the random number selected via <code>random</code> vs NumPy may not be the same: <pre class="prettyprint"><code>assert random.gauss(0, 1) == np.random.normal() # AssertionError assert random.choices(arr, k=1)[0] == np.random.choice(arr, size=1)[0] # AssertionError </code></pre> You have additional functionality in NumPy to make random numbers "predictable". For example, running the below script repeatedly will only ever generate the same result: <pre class="prettyprint"><code>np.random.seed(0) np.random.normal() </code></pre> The same applies to <code>np.random.choice</code>. So there are differences in how the random number is derived and the functionality available. For testing, or other, purposes you may wish to be able to produce consistent "random" numbers.

Speed comparison. numpy vs python standard

Tags:

I made a few experiment and found a number of cases where python's standard random and math library is faster than numpy counterpart.

I think there is a tendency that python's standard library is about 10x faster for small scale operation, while numpy is much faster for large scale (vector) operations. My guess is that numpy has some overhead which becomes dominant for small cases.

My question is: Is my intuition correct? And will it be in general advisable to use the standard library rather than numpy for small (typically scalar) operations?

Examples are below.

import math import random import numpy as np

Log and exponential

%timeit math.log(10) # 158 ns ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)  %timeit np.log(10) # 1.64 µs ± 93.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  %timeit math.exp(3) # 146 ns ± 8.57 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)  %timeit np.exp(3) # 1.72 µs ± 78.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Generate normal distribution

%timeit random.gauss(0, 1) # 809 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  %timeit np.random.normal() # 2.57 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Choosing a random element

%timeit random.choices([1,2,3], k=1) # 1.56 µs ± 55.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  %timeit np.random.choice([1,2,3], size=1) # 23.1 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Same with numpy array

arr = np.array([1,2,3])  %timeit random.choices(arr, k=1) # 1.72 µs ± 33.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  %timeit np.random.choice(arr, size=1) # 18.4 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

With big array

arr = np.arange(10000)  %timeit random.choices(arr, k=1000) # 401 µs ± 6.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  %timeit np.random.choice(arr, size=1000) # 41.7 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

311

asked Oct 02 '18 07:10

Kota Mori

2 Answers

numpy is only really a performance improvement for large blocks of data. The overhead of making sure the memory blocks line up correctly before pouring an ndarray into a c-compiled numpy function will generally overwhelm any time benefit if the array isn't relatively large. This is why so many numpy questions are basically "How do I take this loopy code and make it fast," and why it is considered a valid question in this tag where nearly any other tag will toss you to Code review before they get past the title.

So, yes, your observation is generalizable. Vectorizing is the whole point of numpy. numpy code that isn't vectorized is always slower than bare python code, and is arguably just as "wrong" as cracking a single walnut with a jackhammer. Either find the right tool or get more nuts.

answered Sep 22 '22 10:09

Daniel F

NumPy is used primarily for performance with arrays. This relies on the use of contiguous memory blocks and more efficient lower-level iteration. Applying a NumPy mathematical function on a scalar or calculating a random number are not vectorisable operations. This explains the behaviour you are seeing.

See also What are the advantages of NumPy over regular Python lists?

And will it be in general advisable to use the standard library rather than NumPy for small (typically scalar) operations?

It's rare that the bottleneck for a program is caused by operations on scalars. In practice, the differences are negligible. So either way is fine. If you are already using NumPy there's no harm in continuing to use NumPy operations on scalars.

It's worth making a special case of calculating random numbers. As you might expect, the random number selected via random vs NumPy may not be the same:

assert random.gauss(0, 1) == np.random.normal()  # AssertionError assert random.choices(arr, k=1)[0] == np.random.choice(arr, size=1)[0]  # AssertionError

You have additional functionality in NumPy to make random numbers "predictable". For example, running the below script repeatedly will only ever generate the same result:

np.random.seed(0) np.random.normal()

The same applies to np.random.choice. So there are differences in how the random number is derived and the functionality available. For testing, or other, purposes you may wish to be able to produce consistent "random" numbers.

answered Sep 20 '22 10:09

jpp

Related questions
                            
                                Why does the repartition() method increase file size on disk?
                            
                                Facebook login on localhost - connection not secure
                            
                                With pairwise summation, how many terms do I need to get an appreciably wrong result?
                            
                                Can someone explain hex offsets to me?
                            
                                How do I use cookies across two different domains?
                            
                                asp.net survey questionnaire engine
                            
                                Generating shuffled range using a PRNG rather than shuffling
                            
                                Problem with WCF client calling one-way operation
                            
                                Google Analytics API and .Net
                            
                                Require a password to uninstall/remove application
                            
                                Java equivalent of Python repr()?
                            
                                What happens if you use a <script> tag with the same "src" attribute multiple times within a single HTML document?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With