I created this toy problem that reflects my much bigger problem: <pre class="prettyprint"><code>import numpy as np ind = np.ones((3,2,4)) # shape=(3L, 2L, 4L) dist = np.array([[0.1,0.3],[1,2],[0,1]]) # shape=(3L, 2L) ans = np.array([np.dot(dist[i],ind[i]) for i in xrange(dist.shape[0])]) # shape=(3L, 4L) print ans """ prints: [[ 0.4 0.4 0.4 0.4] [ 3. 3. 3. 3. ] [ 1. 1. 1. 1. ]] """ </code></pre> I want to do it as fast as possible, so using numpy's functions to calculate <code>ans</code> should be the best approach, since this operation is heavy and my matrices are quite big. I saw this post, but the shapes are different and I cannot understand which <code>axes</code> I should use for this problem. However, I'm certain that tensordot should have the answer. Any suggestions? EDIT: I accepted @ajcr's answer, but please read my own answer as well, it may help others...

Following @ajcr's great answer, I wanted to determine which method is the fastest, so I used <code>timeit</code> : <pre class="prettyprint"><code>import timeit setup_code = """ import numpy as np i,j,k = (300,200,400) ind = np.ones((i,j,k)) #shape=(3L, 2L, 4L) dist = np.random.rand(i,j) #shape=(3L, 2L) """ basic ="np.array([np.dot(dist[l],ind[l]) for l in xrange(dist.shape[0])])" einsum = "np.einsum('ijk,ij->ik', ind, dist)" tensor= "np.tensordot(ind, dist, axes=[1, 1])[0].T" print "tensor - total time:", min(timeit.repeat(stmt=tensor,setup=setup_code,number=10,repeat=3)) print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=10,repeat=3)) print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=10,repeat=3)) </code></pre> The surprising results were: <pre class="prettyprint"><code>tensor - total time: 6.59519493952 basic - total time: 0.159871203461 einsum - total time: 0.263569731028 </code></pre> So obviously using tensordot was the wrong way to do it (not to mention <code>memory error</code> in bigger examples, just as @ajcr stated). Since this example was small, I changed the matrices size to be <code>i,j,k = (3000,200,400)</code>, flipped the order just to be sure it has not effect and set up another test with higher numbers of repetitions: <pre class="prettyprint"><code>print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=50,repeat=3)) print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=50,repeat=3)) </code></pre> The results were consistent with the first run: <pre class="prettyprint"><code>einsum - total time: 13.3184077671 basic - total time: 8.44810031351 </code></pre> However, testing another type of size growth - <code>i,j,k = (30000,20,40)</code> led the following results: <pre class="prettyprint"><code>einsum - total time: 0.325594117768 basic - total time: 0.926416766397 </code></pre> See the comments for explanations for these results. The moral is, when looking for the fastest solution for a specific problem, try to generate data that is as similar to the original data as possible, in term of types and shapes. In my case <code>i</code> is much smaller than <code>j,k</code> and so I stayed with the ugly version, which is also the fastest in this case.

You could use <code>np.einsum</code> to do the operation since it allows for very careful control over which axes are multiplied and which are summed: <pre class="prettyprint"><code>>>> np.einsum('ijk,ij->ik', ind, dist) array([[ 0.4, 0.4, 0.4, 0.4], [ 3. , 3. , 3. , 3. ], [ 1. , 1. , 1. , 1. ]]) </code></pre> The function multiplies the entries in the first axis of <code>ind</code> with the entries in the first axis of <code>dist</code> (subscript <code>'i'</code>). Ditto for the second axis of each array (subscript <code>'j'</code>). Instead of returning a 3D array, we tell einsum to sum the values along axis <code>'j'</code> by omitting it from the output subscripts, thereby returning a 2D array. <hr> <code>np.tensordot</code> is more difficult to apply to this problem. It automatically sums the products of axes. However, we want two sets of products but to sum only one of them. Writing <code>np.tensordot(ind, dist, axes=[1, 1])</code> (as in the answer you linked to) computes the correct values for you, but returns a 3D array with shape <code>(3, 4, 3)</code>. If you can afford the memory cost of a larger array, you could use: <pre class="prettyprint"><code>np.tensordot(ind, dist, axes=[1, 1])[0].T </code></pre> This gives you the correct result, but because <code>tensordot</code> creates a much larger-than-necessary array first, <code>einsum</code> seems to be a better option.

Multiply high order matrices with numpy

Tags:

python

matrix

numpy

scipy

linear-algebra

I created this toy problem that reflects my much bigger problem:

Click to copy

import numpy as np

ind = np.ones((3,2,4)) # shape=(3L, 2L, 4L)
dist = np.array([[0.1,0.3],[1,2],[0,1]]) # shape=(3L, 2L)

ans = np.array([np.dot(dist[i],ind[i]) for i in xrange(dist.shape[0])]) # shape=(3L, 4L)
print ans

""" prints:
   [[ 0.4  0.4  0.4  0.4]
    [ 3.   3.   3.   3. ]
    [ 1.   1.   1.   1. ]]
"""

I want to do it as fast as possible, so using numpy's functions to calculate ans should be the best approach, since this operation is heavy and my matrices are quite big.

I saw this post, but the shapes are different and I cannot understand which axes I should use for this problem. However, I'm certain that tensordot should have the answer. Any suggestions?

EDIT: I accepted @ajcr's answer, but please read my own answer as well, it may help others...

655

asked Jun 13 '15 08:06

omerbp

2 Answers

Following @ajcr's great answer, I wanted to determine which method is the fastest, so I used timeit :

Click to copy

import timeit

setup_code = """
import numpy as np
i,j,k = (300,200,400)
ind = np.ones((i,j,k)) #shape=(3L, 2L, 4L)
dist = np.random.rand(i,j) #shape=(3L, 2L)
"""

basic ="np.array([np.dot(dist[l],ind[l]) for l in xrange(dist.shape[0])])"
einsum = "np.einsum('ijk,ij->ik', ind, dist)"
tensor= "np.tensordot(ind, dist, axes=[1, 1])[0].T"

print "tensor - total time:", min(timeit.repeat(stmt=tensor,setup=setup_code,number=10,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=10,repeat=3))
print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=10,repeat=3))

The surprising results were:

Click to copy

tensor - total time: 6.59519493952
basic - total time: 0.159871203461
einsum - total time: 0.263569731028

So obviously using tensordot was the wrong way to do it (not to mention memory error in bigger examples, just as @ajcr stated).

Since this example was small, I changed the matrices size to be i,j,k = (3000,200,400), flipped the order just to be sure it has not effect and set up another test with higher numbers of repetitions:

Click to copy

print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=50,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=50,repeat=3))

The results were consistent with the first run:

Click to copy

einsum - total time: 13.3184077671
basic - total time: 8.44810031351

However, testing another type of size growth - i,j,k = (30000,20,40) led the following results:

Click to copy

einsum - total time: 0.325594117768
basic - total time: 0.926416766397

See the comments for explanations for these results.

The moral is, when looking for the fastest solution for a specific problem, try to generate data that is as similar to the original data as possible, in term of types and shapes. In my case i is much smaller than j,k and so I stayed with the ugly version, which is also the fastest in this case.

144

answered Oct 18 '22 09:10

omerbp

You could use np.einsum to do the operation since it allows for very careful control over which axes are multiplied and which are summed:

Click to copy

>>> np.einsum('ijk,ij->ik', ind, dist)
array([[ 0.4,  0.4,  0.4,  0.4],
       [ 3. ,  3. ,  3. ,  3. ],
       [ 1. ,  1. ,  1. ,  1. ]])

The function multiplies the entries in the first axis of ind with the entries in the first axis of dist (subscript 'i'). Ditto for the second axis of each array (subscript 'j'). Instead of returning a 3D array, we tell einsum to sum the values along axis 'j' by omitting it from the output subscripts, thereby returning a 2D array.

np.tensordot is more difficult to apply to this problem. It automatically sums the products of axes. However, we want two sets of products but to sum only one of them.

Writing np.tensordot(ind, dist, axes=[1, 1]) (as in the answer you linked to) computes the correct values for you, but returns a 3D array with shape (3, 4, 3). If you can afford the memory cost of a larger array, you could use:

Click to copy

np.tensordot(ind, dist, axes=[1, 1])[0].T

This gives you the correct result, but because tensordot creates a much larger-than-necessary array first, einsum seems to be a better option.

answered Oct 18 '22 10:10

Alex Riley

Related questions
                            
                                Find the longest substring in alphabetical order
                            
                                How to parse json with ijson and python
                            
                                What is the time complexity of sum() in Python? [closed]
                            
                                python convert a string to arguments list
                            
                                How to install delete-project plugin in gerrit?
                            
                                How to catch exceptions in workers in Multiprocessing
                            
                                piping from stdin to a python code in a bash script
                            
                                clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
                            
                                How to output RandomForest Classifier from python?
                            
                                How to avoid for-in looping over None in Python
                            
                                HTTP 403 Responses when using Python Scrapy
                            
                                Why is converting a list to a set faster than using just list to compute a list difference?
                            
                                python read all files in directory and subdirectories
                            
                                How do I use CSV Writers with GZIP files in Python 3?
                            
                                Why does `if Exception` work in Python
                            
                                Export a simple Dictionary into Excel file in python
                            
                                Django: How to check if username already exists?
                            
                                How could one disable new account creation with django-allauth, but still allow existing users to sign in? [duplicate]
                            
                                deactivate conflict in virtualenvwapper and anaconda
                            
                                How to set timeout to pyplot.show() in matplotlib?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiply high order matrices with numpy

Tags:

python

matrix

numpy

scipy

linear-algebra

omerbp

People also ask

2 Answers

omerbp

Alex Riley

Recent Activity

Donate For Us