Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

Tags:

I'm getting some efficiency test results that I can't explain.

I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is a 2D matrix, and so is x.

I can do this three ways, to test performance I make random (numpy.random.randn) matrices A = (10,1000,1000), x = (1000,1200). I get the following time results:

(1) single multidimensional dot product

B = A.dot(x)

total time: 102.361 s

(2) looping through i and performing 2D dot products

   # initialize B = np.zeros([dim1, dim2, dim3])
   for i in range(A.shape[0]):
       B[i,:,:] = A[i,:,:].dot(x)

total time: 0.826 s

(3) numpy.einsum

B3 = np.einsum("ijk, kl -> ijl", A, x)

total time: 8.289 s

So, option (2) is the fastest by far. But, considering just (1) and (2), I don't see the big difference between them. How can looping through and doing 2D dot products be ~ 124 times faster? They both use numpy.dot. Any insights?

I include the code used for the above results just below:

import numpy as np
import numpy.random as npr
import time

dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)

# consider three ways of assembling the same matrix B: B1, B2, B3

t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)


B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
    B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)

t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3

951

asked Oct 08 '15 00:10

Bent Snowman

1 Answers

With smaller dims 10,100,200, I get a similar ranking

In [355]: %%timeit
   .....: B=np.zeros((N,M,L))
   .....: for i in range(N):
              B[i,:,:]=np.dot(A[i,:,:],x)
   .....: 
10 loops, best of 3: 22.5 ms per loop
In [356]: timeit np.dot(A,x)
10 loops, best of 3: 44.2 ms per loop
In [357]: timeit np.einsum('ijk,km->ijm',A,x)
10 loops, best of 3: 29 ms per loop

In [367]: timeit np.dot(A.reshape(-1,M),x).reshape(N,M,L)
10 loops, best of 3: 22.1 ms per loop

In [375]: timeit np.tensordot(A,x,(2,0))
10 loops, best of 3: 22.2 ms per loop

the itererative is faster, though not by as much as in your case.

This is probably true as long as that iterating dimension is small compared to the other ones. In that case the overhead of iteration (function calls etc) is small compared to the calculation time. And doing all the values at once uses more memory.

I tried a dot variation where I reshaped A into 2d, thinking that dot does that kind of reshaping internally. I'm surprised that it is actually fastest. tensordot is probably doing the same reshaping (that code if Python readable).

einsum sets up a 'sum of products' iteration involving 4 variables, the i,j,k,m - that is dim1*dim2*dim2*dim3 steps with the C level nditer. So the more indices you have the larger the iteration space.

answered Nov 15 '22 18:11

hpaulj

Related questions
                            
                                Property and __getattr__ compatibility issue with AttributeError
                            
                                Is this possible > Sublime3 + a python linter + virtualenv?
                            
                                Installing PySpark
                            
                                Fast Information Gain computation
                            
                                flask-restul nested resources
                            
                                matplotlib contour plot labels overlap axes
                            
                                BeautifulSoup extract XPATH or CSS Path of node
                            
                                Reversing XOR and Bitwise operation in Python
                            
                                Extracting multiple submatrices in Python
                            
                                Python - Theano scan() function
                            
                                Delete first child node using BeautifulSoup
                            
                                TypeError: '_io.TextIOWrapper' object is not subscriptable
                            
                                Assigning values to Pandas Multiindex DataFrame by index level
                            
                                Python modules not found over terminal but on python shell, Linux
                            
                                In Python, using jsonpath-rw to get values for specific attribute (json/dict)
                            
                                Slow bitwise operations
                            
                                How to set Memcached retrieval timeout in Django
                            
                                Python pandas: replace values multiple columns matching multiple columns from another dataframe
                            
                                Run function after a certain type of model is committed
                            
                                A Test to Support Both JSON and File Multipart Uploads in DRF

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

Tags:

python

multidimensional-array

numpy

product

Bent Snowman

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us