I have been asking related questions on this stack for the past week to try to isolate things I didn't understand about using the @jit decorator with Numba in Python. However, I'm hitting the wall, so I'll just write the whole problem.
The issue at hand is to calculate the minimal distance between pairs of a large number of segments. The segments are represented by their beginning and endpoints in 3D. Mathematically, every segment is parameterized as [AB] = A + (B-A)*s, where s in [0,1], and A and B are the beginning and endpoints of the segment. For two such segments, the minimal distance can be computed and the formula is given here.
I have already exposed this problem on another thread, and the answer given dealt with replacing double loops of my code by vectorizing the problem, which however would hit memory issues for large sets of segments. Therefore, I've decided to stick with the loops, and use numba's jit instead.
Since the solution to the problem requires a lot of dot products, and numpy's dot product is not supported by numba, I started by implementing my own 3D dot product.
import numpy as np
from numba import jit, autojit, double, float64, float32, void, int32
def my_dot(a,b):
res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2]
return res
dot_jit = jit(double(double[:], double[:]))(my_dot) #I know, it's not of much use here.
The function computing the minimal distance of all pairs in N segments takes as input an Nx6 array (6 coordinates)
def compute_stuff(array_to_compute):
N = len(array_to_compute)
con_mat = np.zeros((N,N))
for i in range(N):
for j in range(i+1,N):
p0 = array_to_compute[i,0:3]
p1 = array_to_compute[i,3:6]
q0 = array_to_compute[j,0:3]
q1 = array_to_compute[j,3:6]
s = ( dot_jit((p1-p0),(q1-q0))*dot_jit((q1-q0),(p0-q0)) - dot_jit((q1-q0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 )
t = ( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(p0-q0)) -dot_jit((p1-p0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 )
con_mat[i,j] = np.sum( (p0+(p1-p0)*s-(q0+(q1-q0)*t))**2 )
return con_mat
fast_compute_stuff = jit(double[:,:](double[:,:]))(compute_stuff)
So, compute_stuff(arg) takes as argument an 2D np.array ( double[:,:]) , performs a bunch of numba-supported (?) operations, and returns another 2D np.array (double[:,:]). However,
v = np.random.random( (100,6) )
%timeit compute_stuff(v)
%timeit fast_compute_stuff(v)
I get 134 and 123 ms per loop. Can you shed some light on why I fail to speed up my function ? Any feedback would be much appreciated.
Here's my version of your code which is significantly faster:
@jit(nopython=True)
def dot(a,b):
res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2]
return res
@jit
def compute_stuff2(array_to_compute):
N = array_to_compute.shape[0]
con_mat = np.zeros((N,N))
p0 = np.zeros(3)
p1 = np.zeros(3)
q0 = np.zeros(3)
q1 = np.zeros(3)
p0m1 = np.zeros(3)
p1m0 = np.zeros(3)
q0m1 = np.zeros(3)
q1m0 = np.zeros(3)
p0mq0 = np.zeros(3)
for i in range(N):
for j in range(i+1,N):
for k in xrange(3):
p0[k] = array_to_compute[i,k]
p1[k] = array_to_compute[i,k+3]
q0[k] = array_to_compute[j,k]
q1[k] = array_to_compute[j,k+3]
p0m1[k] = p0[k] - p1[k]
p1m0[k] = -p0m1[k]
q0m1[k] = q0[k] - q1[k]
q1m0[k] = -q0m1[k]
p0mq0[k] = p0[k] - q0[k]
s = ( dot(p1m0, q1m0)*dot(q1m0, p0mq0) - dot(q1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 )
t = ( dot(p1m0, p1m0)*dot(q1m0, p0mq0) - dot(p1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 )
for k in xrange(3):
con_mat[i,j] += (p0[k]+(p1[k]-p0[k])*s-(q0[k]+(q1[k]-q0[k])*t))**2
return con_mat
And the timings:
In [38]:
v = np.random.random( (100,6) )
%timeit compute_stuff(v)
%timeit fast_compute_stuff(v)
%timeit compute_stuff2(v)
np.allclose(compute_stuff2(v), compute_stuff(v))
#10 loops, best of 3: 107 ms per loop
#10 loops, best of 3: 108 ms per loop
#10000 loops, best of 3: 114 µs per loop
#True
My basic strategy for speeding this up was:
dot
method.Another thing to note is that for recent versions of numba, what use to be called autojit
(i.e. letting numba do type inference on the inputs) and is now just the decorator without type hints is usually just as fast as specifying your input types in my experience.
Also timings were run using numba 0.17.0 on OS X using the Anaconda python distribution with Python 2.7.9.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With