Why is numpy.array so slow?

Tags:

I am baffled by this

def main():     for i in xrange(2560000):         a = [0.0, 0.0, 0.0]  main()  $ time python test.py  real     0m0.793s

Let's now see with numpy:

import numpy  def main():     for i in xrange(2560000):         a = numpy.array([0.0, 0.0, 0.0])  main()  $ time python test.py  real    0m39.338s

Holy CPU cycles batman!

Using numpy.zeros(3) improves, but still not enough IMHO

$ time python test.py  real    0m5.610s user    0m5.449s sys 0m0.070s

numpy.version.version = '1.5.1'

If you are wondering if the list creation is skipped for optimization in the first example, it is not:

  5          19 LOAD_CONST               2 (0.0)              22 LOAD_CONST               2 (0.0)              25 LOAD_CONST               2 (0.0)              28 BUILD_LIST               3              31 STORE_FAST               1 (a)

329

asked Jul 02 '11 20:07

2 Answers

Numpy is optimised for large amounts of data. Give it a tiny 3 length array and, unsurprisingly, it performs poorly.

Consider a separate test

import timeit  reps = 100  pythonTest = timeit.Timer('a = [0.] * 1000000') numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy') uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy') # empty simply allocates the memory. Thus the initial contents of the array  # is random noise  print 'python list:', pythonTest.timeit(reps), 'seconds' print 'numpy array:', numpyTest.timeit(reps), 'seconds' print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'

And the output is

python list: 1.22042918205 seconds numpy array: 1.05412316322 seconds uninitialised array: 0.0016028881073 seconds

It would seem that it is the zeroing of the array that is taking all the time for numpy. So unless you need the array to be initialised then try using empty.

150

answered Sep 20 '22 14:09

But please rather consider something very fundamental related to numpy; sophisticated linear algebra based functionality (like random numbers or singular value decomposition). Now, consider these seamingly simple calculations:

In []: A= rand(2560000, 3) In []: %timeit rand(2560000, 3) 1 loops, best of 3: 296 ms per loop In []: %timeit u, s, v= svd(A, full_matrices= False) 1 loops, best of 3: 571 ms per loop

and please trust me that this kind of performance will not be beaten significantly by any package currently available.

So, please describe your real problem, and I'll try to figure out decent numpy based solution for it.

Update:
Here is some simply code for ray sphere intersection:

import numpy as np  def mag(X):     # magnitude     return (X** 2).sum(0)** .5  def closest(R, c):     # closest point on ray to center and its distance     P= np.dot(c.T, R)* R     return P, mag(P- c)  def intersect(R, P, h, r):     # intersection of rays and sphere     return P- (h* (2* r- h))** .5* R  # set up c, r= np.array([10, 10, 10])[:, None], 2. # center, radius n= 5e5 R= np.random.rand(3, n) # some random rays in first octant R= R/ mag(R) # normalized to unit length  # find rays which will intersect sphere P, b= closest(R, c) wi= b<= r  # and for those which will, find the intersection X= intersect(R[:, wi], P[:, wi], r- b[wi], r)

Apparently we calculated correctly:

In []: allclose(mag(X- c), r) Out[]: True

And some timings:

In []: % timeit P, b= closest(R, c) 10 loops, best of 3: 93.4 ms per loop In []: n/ 0.0934 Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi]) 10 loops, best of 3: 32.7 ms per loop In []: X.shape[1]/ 0.0327 Out[]: 874037 #=> almost 1 million actual intersections/ s

These timings are done with very modest machine. With modern machine, a significant speed-up can be still expected.

Anyway, this is only a short demonstration how to code with numpy.

answered Sep 17 '22 14:09

eat

Related questions
                            
                                Bokeh: save plot (as HTML) but don't show it
                            
                                pip install requests[security] vs pip install requests: Difference
                            
                                What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?
                            
                                How do I use os.scandir() to return DirEntry objects recursively on a directory tree?
                            
                                Click Command Line Interfaces: Make options required if other optional option is unset
                            
                                How to quote a string value explicitly (Python DB API/Psycopg2)
                            
                                Accessing a Python traceback from the C API
                            
                                Class decorators vs function decorators [duplicate]
                            
                                Update new Django and Python 2.7.* with virtualenv on Dreamhost (with passenger)
                            
                                Storing lambdas in a dictionary
                            
                                Python global keyword vs. Pylint W0603
                            
                                Returning JSON array from a Django view to a template
                            
                                Setting selenium to use custom profile, but it keeps opening with default
                            
                                Submitting to a web form using python
                            
                                How to convert a Numpy 2D array with object dtype to a regular 2D array of floats
                            
                                sklearn GridSearchCV with Pipeline
                            
                                Unpack NumPy array by column
                            
                                Splitting a conftest.py file into several smaller conftest-like parts
                            
                                The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas
                            
                                Read JSON to pandas dataframe - ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is numpy.array so slow?

Tags:

performance

python

numpy

Stefano Borini

People also ask

2 Answers

Dunes

eat

Recent Activity

Donate For Us