I was surprised that calling <code>np.inner</code> to compute a sum of squares was about 5x faster than calling <code>np.sum</code> on a pre-computed array of squares: <img src="https://i.stack.imgur.com/9ns9A.png" alt="sum of squares code"> Any insights into this behavior? I'm actually interested in a very fast implementation of a sum of squares, so those thoughts are welcome, too.

To check in which modules <code>np.inner</code> and <code>np.sum</code> are implemented I type <pre class="prettyprint"><code>>>> np.inner.__module__ 'numpy.core.multiarray' >>> np.sum.__module__ 'numpy.core.fromnumeric' >>> np.__file__ '/Users/uweschmitt/venv_so/lib/python3.5/site-packages/numpy/__init__.py' </code></pre> If you inspect the actual files, you can see that <code>numpy.core.multiarray</code> is a pure C module whereas <code>numpy.core.fromnumeric</code> first does some checks and conversions in Python before a second Python function and then a pure C implementation for the actual summation is called. I suspect that this overhead from the Python interpreter explains the observed timing differences. To prove my assumption I run the timing with a larger array and get <pre class="prettyprint"><code>In [8]: a = np.random.random(1000000) In [9]: %timeit np.inner(a, a) 1000 loops, best of 3: 673 µs per loop In [10]: %timeit np.sum(a) 1000 loops, best of 3: 584 µs per loop </code></pre> Now run times are quite similar and change a little if you repeat the statements, sometimes <code>np.sum</code> wins, somtimes <code>np.inner</code>. For the big array the actual work of <code>np.sum</code> is done in C and the constant time overhead from the Python interpreter is negligible.

Sum of Squares - np.inner vs squaring first, then summing

1 Answers

To check in which modules np.inner and np.sum are implemented I type

>>> np.inner.__module__
'numpy.core.multiarray'
>>> np.sum.__module__
'numpy.core.fromnumeric'
>>> np.__file__
'/Users/uweschmitt/venv_so/lib/python3.5/site-packages/numpy/__init__.py'

If you inspect the actual files, you can see that numpy.core.multiarray is a pure C module whereas numpy.core.fromnumeric first does some checks and conversions in Python before a second Python function and then a pure C implementation for the actual summation is called.

I suspect that this overhead from the Python interpreter explains the observed timing differences.

To prove my assumption I run the timing with a larger array and get

In [8]: a = np.random.random(1000000)
In [9]: %timeit np.inner(a, a)
1000 loops, best of 3: 673 µs per loop
In [10]: %timeit np.sum(a)
1000 loops, best of 3: 584 µs per loop

Now run times are quite similar and change a little if you repeat the statements, sometimes np.sum wins, somtimes np.inner.

For the big array the actual work of np.sum is done in C and the constant time overhead from the Python interpreter is negligible.

answered Oct 16 '22 04:10

rocksportrocker

Related questions
                            
                                Adding + sign to exponent in matplotlib axes
                            
                                Why does my LRU cache miss with the same argument?
                            
                                NetworkX shuffles nodes order
                            
                                How to design an async pipeline pattern in python
                            
                                HDF5 possible data corruption or loss?
                            
                                SciPy Curve Fit Fails Power Law
                            
                                Anaconda not updating to latest
                            
                                Why does `subprocess.check_call(..., stderr=sys.stdout)` fail in Python 2.6?
                            
                                Stress attribute -- sklearn.manifold.MDS / Python
                            
                                How to send print job to printer in python
                            
                                Calculating distance between *multiple* sets of geo coordinates in python
                            
                                Calling Parent Variables into List
                            
                                How to change and reload python code in waitress without restarting the server?
                            
                                Multi-dimension dictionary in configparser
                            
                                How do prevent pip and easy_install from removing the temporary directories?
                            
                                Location for configuration in a virtualenv
                            
                                Efficient cython file reading, string parsing, and array building
                            
                                Python server "Aborted (Core dumped)"
                            
                                Trouble with relative / absolute functions import in scikit-image
                            
                                Deserializing a huge json string to python objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sum of Squares - np.inner vs squaring first, then summing

Tags:

python

numpy

bcf

People also ask

1 Answers

rocksportrocker

Recent Activity

Donate For Us