Hi I'm relatively new here and trying to do some calculations with numpy. I'm experiencing a long elapse time from one particular calculation and can't work out any faster way to achieve the same thing. Basically its part of a ray triangle intersection algorithm and I need to calculate all the vector cros products from two matrices of different sizes. The code I was using was : <pre class="prettyprint"><code>allhvals1 = numpy.cross( dirvectors[:,None,:], trivectors2[None,:,:] ) </code></pre> where <code>dirvectors</code> is an array of <code>n* vectors (xyz)</code> and <code>trivectors2</code> is an array of <code>m*vectors(xyz)</code>. <code>allhvals1</code> is an array of the cross products of size <code>n*M*vector (xyz)</code>. This works but is very slow. It's essentially the n*m matrix of each vector from each array. Hope that you understand. The sizes of each varies from approx 1-4000 depending on parameters (I basically chunk the dirvectors dependent on size). Any advice appreciated. Unfortunately my matrix math is somewhat flakey.

If you look at the source code of <code>np.cross</code>, it basically moves the <code>xyz</code> dimension to the front of the shape tuple for all arrays, and then has the calculation of each of the components spelled out like this: <pre class="prettyprint"><code>x = a[1]*b[2] - a[2]*b[1] y = a[2]*b[0] - a[0]*b[2] z = a[0]*b[1] - a[1]*b[0] </code></pre> In your case, each of those products requires allocating huge arrays, so the overall behavior is not very efficient. Lets set up some test data: <pre class="prettyprint"><code>u = np.random.rand(1000, 3) v = np.random.rand(2000, 3) In [13]: %timeit s1 = np.cross(u[:, None, :], v[None, :, :]) 1 loops, best of 3: 591 ms per loop </code></pre> We can try to compute it using Levi-Civita symbols and <code>np.einsum</code> as follows: <pre class="prettyprint"><code>eijk = np.zeros((3, 3, 3)) eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1 eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1 In [14]: %timeit s2 = np.einsum('ijk,uj,vk->uvi', eijk, u, v) 1 loops, best of 3: 706 ms per loop In [15]: np.allclose(s1, s2) Out[15]: True </code></pre> So while it works, it has worse performance. The thing is that <code>np.einsum</code> has trouble when there are more than two operands, but has optimized pathways for two or less. So we can try to rewrite it in two steps, to see if it helps: <pre class="prettyprint"><code>In [16]: %timeit s3 = np.einsum('iuk,vk->uvi', np.einsum('ijk,uj->iuk', eijk, u), v) 10 loops, best of 3: 63.4 ms per loop In [17]: np.allclose(s1, s3) Out[17]: True </code></pre> Bingo! Close to an order of magnitude improvement... Some performance figures for NumPy 1.11.0 with <code>a=numpy.random.rand(n,3)</code>, <code>b=numpy.random.rand(n,3)</code>: <img src="https://i.stack.imgur.com/uVc2X.png" alt="enter image description here"> The nested <code>einsum</code> is about twice as fast as <code>cross</code> for the largest <code>n</code> tested.

how to speed up a vector cross product calculation

Tags:

performance

python

outer-join

numpy

Hi I'm relatively new here and trying to do some calculations with numpy. I'm experiencing a long elapse time from one particular calculation and can't work out any faster way to achieve the same thing.

Basically its part of a ray triangle intersection algorithm and I need to calculate all the vector cros products from two matrices of different sizes.

The code I was using was :

allhvals1 = numpy.cross( dirvectors[:,None,:], trivectors2[None,:,:] )

where dirvectors is an array of n* vectors (xyz) and trivectors2 is an array of m*vectors(xyz). allhvals1 is an array of the cross products of size n*M*vector (xyz). This works but is very slow. It's essentially the n*m matrix of each vector from each array. Hope that you understand. The sizes of each varies from approx 1-4000 depending on parameters (I basically chunk the dirvectors dependent on size).

Any advice appreciated. Unfortunately my matrix math is somewhat flakey.

919

asked Jan 03 '14 16:01

user1942439

1 Answers

If you look at the source code of np.cross, it basically moves the xyz dimension to the front of the shape tuple for all arrays, and then has the calculation of each of the components spelled out like this:

x = a[1]*b[2] - a[2]*b[1]
y = a[2]*b[0] - a[0]*b[2]
z = a[0]*b[1] - a[1]*b[0]

In your case, each of those products requires allocating huge arrays, so the overall behavior is not very efficient.

Lets set up some test data:

u = np.random.rand(1000, 3)
v = np.random.rand(2000, 3)

In [13]: %timeit s1 = np.cross(u[:, None, :], v[None, :, :])
1 loops, best of 3: 591 ms per loop

We can try to compute it using Levi-Civita symbols and np.einsum as follows:

eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1

In [14]: %timeit s2 = np.einsum('ijk,uj,vk->uvi', eijk, u, v)
1 loops, best of 3: 706 ms per loop

In [15]: np.allclose(s1, s2)
Out[15]: True

So while it works, it has worse performance. The thing is that np.einsum has trouble when there are more than two operands, but has optimized pathways for two or less. So we can try to rewrite it in two steps, to see if it helps:

In [16]: %timeit s3 = np.einsum('iuk,vk->uvi', np.einsum('ijk,uj->iuk', eijk, u), v)
10 loops, best of 3: 63.4 ms per loop

In [17]: np.allclose(s1, s3)
Out[17]: True

Bingo! Close to an order of magnitude improvement...

Some performance figures for NumPy 1.11.0 with a=numpy.random.rand(n,3), b=numpy.random.rand(n,3):

enter image description here

The nested einsum is about twice as fast as cross for the largest n tested.

106

answered Oct 24 '22 18:10

Jaime

Related questions
                            
                                cx-freeze doesn't find all dependencies
                            
                                How to obtain the day of the week in a 3 letter format from a datetime object in python?
                            
                                How do you create a for loop with a dynamic range?
                            
                                Using python-ctypes to interface fortran with python
                            
                                xlrd original value of the cell
                            
                                Pandas Plots: Separate color for weekends, pretty printing times on x axis
                            
                                python plot and powerlaw fit
                            
                                How can i get a file extension from a filetype?
                            
                                Bar chart in pylab from a dictionary
                            
                                Celery chain breaks if one of the tasks fail
                            
                                How do I make shelve file empty in python?
                            
                                orthogonal projection with numpy
                            
                                Print out message only once from the for loop
                            
                                imshow when you are plotting data, not images. Realtion between aspect and extent?
                            
                                How to strip whitespace from before but not after punctuation in python
                            
                                Python Twisted's DeferredLock
                            
                                Add "nan" to numpy array 20 times without loop
                            
                                One-line raise if
                            
                                Stop the thread until the celery task finishes
                            
                                How can I get the traceback object ( sys.exc_info()[2] , same as sys.exc_traceback ) as a string?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With