Speeding up numpy.dot

Tags:

I've got a numpy script that spends about 50% of its runtime in the following code:

s = numpy.dot(v1, v1)

where

v1 = v[1:]

and v is a 4000-element 1D ndarray of float64 stored in contiguous memory (v.strides is (8,)).

Any suggestions for speeding this up?

edit This is on Intel hardware. Here is the output of my numpy.show_config():

Click to copy

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_mkl_info:
  NOT AVAILABLE

blas_mkl_info:
  NOT AVAILABLE

mkl_info:
  NOT AVAILABLE

845

asked May 13 '11 10:05

NPE

2 Answers

Your arrays are not very big, so ATLAS probably isn't doing much. What are your timings for the following Fortran program? Assuming ATLAS isn't doing much, this should give you a sense of how fast dot() could be if there was not any python overhead. With gfortran -O3 I get speeds of 5 +/- 0.5 us.

Click to copy

    program test

    real*8 :: x(4000), start, finish, s
    integer :: i, j
    integer,parameter :: jmax = 100000

    x(:) = 4.65
    s = 0.
    call cpu_time(start)
    do j=1,jmax
        s = s + dot_product(x, x)
    enddo
    call cpu_time(finish)
    print *, (finish-start)/jmax * 1.e6, s

    end program test

184

answered Sep 19 '22 18:09

matt

Perhaps the culprit is copying of the arrays passed to dot.

As Sven said, the dot product relies on BLAS operations. These operations require arrays stored in contiguous C order. If both arrays passed to dot are in C_CONTIGUOUS, you ought to see better performance.

Of course, if your two arrays passed to dot are indeed 1D (8,) then you should see both the C_CONTIGUOUS AND F_CONTIGUOUS flags set to True; but if they are (1, 8), then you can see mixed order.

Click to copy

>>> w = NP.random.randint(0, 10, 100).reshape(100, 1)
>>> w.flags
   C_CONTIGUOUS : True
   F_CONTIGUOUS : False
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

An alternative: use _GEMM from BLAS, which is exposed through the module, scipy.linalg.fblas. (The two arrays, A and B, are obviously in Fortran order because fblas is used.)

Click to copy

from scipy.linalg import fblas as FB
X = FB.dgemm(alpha=1., a=A, b=B, trans_b=True)

answered Sep 19 '22 18:09

doug

Related questions
                            
                                Python/YACC Lexer: Token priority?
                            
                                Calculating the null space of a matrix
                            
                                More elegant way to initialize list of duplicated items in Python
                            
                                Fill OUTSIDE of polygon | Mask array where indicies are beyond a circular boundary?
                            
                                Which Exception for notifying that subclass should implement a method?
                            
                                "no matching architecture in universal wrapper" problem in wxPython?
                            
                                Run shell command with input redirections from python 2.4?
                            
                                Python: Can we convert a ctypes structure to a dictionary?
                            
                                How to handle call to __setattr__ from __init__?
                            
                                How to distinguish different types of NaN float in Python
                            
                                Subprocess module errors with 'export' in python on linux?
                            
                                Custom field's to_python not working? - Django
                            
                                Workaround OSError with os.listdir
                            
                                Best way to run remote commands thru ssh in Twisted?
                            
                                Python float formatting - like "g", but with more digits
                            
                                Python/Scipy Interpolation (map_coordinates)
                            
                                Python nonlocal statement in a class definition
                            
                                What do these python `import` statements mean?
                            
                                Python matplotlib contour plot logarithmic color scale
                            
                                How to get current domain name with Python/GAE?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speeding up numpy.dot

Tags:

performance

python

numpy

dot-product

NPE

People also ask

2 Answers

matt

doug

Recent Activity

Donate For Us