Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inter segment distance using numba jit, Python

I have been asking related questions on this stack for the past week to try to isolate things I didn't understand about using the @jit decorator with Numba in Python. However, I'm hitting the wall, so I'll just write the whole problem.

The issue at hand is to calculate the minimal distance between pairs of a large number of segments. The segments are represented by their beginning and endpoints in 3D. Mathematically, every segment is parameterized as [AB] = A + (B-A)*s, where s in [0,1], and A and B are the beginning and endpoints of the segment. For two such segments, the minimal distance can be computed and the formula is given here.

I have already exposed this problem on another thread, and the answer given dealt with replacing double loops of my code by vectorizing the problem, which however would hit memory issues for large sets of segments. Therefore, I've decided to stick with the loops, and use numba's jit instead.

Since the solution to the problem requires a lot of dot products, and numpy's dot product is not supported by numba, I started by implementing my own 3D dot product.

import numpy as np
from numba import jit, autojit, double, float64, float32, void, int32

def my_dot(a,b):
    res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2]
    return res

dot_jit = jit(double(double[:], double[:]))(my_dot)    #I know, it's not of much use here.

The function computing the minimal distance of all pairs in N segments takes as input an Nx6 array (6 coordinates)

def compute_stuff(array_to_compute):
    N = len(array_to_compute)
    con_mat = np.zeros((N,N))
    for i in range(N):
        for j in range(i+1,N):

            p0 = array_to_compute[i,0:3]
            p1 = array_to_compute[i,3:6]
            q0 = array_to_compute[j,0:3]
            q1 = array_to_compute[j,3:6]

            s = ( dot_jit((p1-p0),(q1-q0))*dot_jit((q1-q0),(p0-q0)) - dot_jit((q1-q0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 )
            t = ( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(p0-q0)) -dot_jit((p1-p0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 )

            con_mat[i,j] = np.sum( (p0+(p1-p0)*s-(q0+(q1-q0)*t))**2 ) 

return con_mat

fast_compute_stuff = jit(double[:,:](double[:,:]))(compute_stuff)

So, compute_stuff(arg) takes as argument an 2D np.array ( double[:,:]) , performs a bunch of numba-supported (?) operations, and returns another 2D np.array (double[:,:]). However,

v = np.random.random( (100,6) )
%timeit compute_stuff(v)
%timeit fast_compute_stuff(v)

I get 134 and 123 ms per loop. Can you shed some light on why I fail to speed up my function ? Any feedback would be much appreciated.

like image 282
Mathusalem Avatar asked Mar 10 '15 18:03

Mathusalem


1 Answers

Here's my version of your code which is significantly faster:

@jit(nopython=True)
def dot(a,b):
    res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2]
    return res

@jit
def compute_stuff2(array_to_compute):
    N = array_to_compute.shape[0]
    con_mat = np.zeros((N,N))

    p0 = np.zeros(3)
    p1 = np.zeros(3)
    q0 = np.zeros(3)
    q1 = np.zeros(3)

    p0m1 = np.zeros(3)
    p1m0 = np.zeros(3)
    q0m1 = np.zeros(3)
    q1m0 = np.zeros(3)
    p0mq0 = np.zeros(3)

    for i in range(N):
        for j in range(i+1,N):

            for k in xrange(3):
                p0[k] = array_to_compute[i,k]
                p1[k] = array_to_compute[i,k+3]
                q0[k] = array_to_compute[j,k]
                q1[k] = array_to_compute[j,k+3]

                p0m1[k] = p0[k] - p1[k]
                p1m0[k] = -p0m1[k]

                q0m1[k] = q0[k] - q1[k]
                q1m0[k] = -q0m1[k]

                p0mq0[k] = p0[k] - q0[k]

            s = ( dot(p1m0, q1m0)*dot(q1m0, p0mq0) - dot(q1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 )
            t = ( dot(p1m0, p1m0)*dot(q1m0, p0mq0) - dot(p1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 )


            for k in xrange(3):
                con_mat[i,j] += (p0[k]+(p1[k]-p0[k])*s-(q0[k]+(q1[k]-q0[k])*t))**2 

    return con_mat

And the timings:

In [38]:

v = np.random.random( (100,6) )
%timeit compute_stuff(v)
%timeit fast_compute_stuff(v)
%timeit compute_stuff2(v)

np.allclose(compute_stuff2(v), compute_stuff(v))

#10 loops, best of 3: 107 ms per loop
#10 loops, best of 3: 108 ms per loop
#10000 loops, best of 3: 114 µs per loop
#True

My basic strategy for speeding this up was:

  • Get rid of all array expressions and explicitly unroll the vectorization (especially since your arrays are so small)
  • Get rid of redundant calculations (subtracting two vectors) in your calls to the dot method.
  • Move all array creation to outside of the nested for loops so that numba could potentially do some loop lifting. This also avoids creating many small arrays which is costly. It's better to allocate once and reuse the memory.

Another thing to note is that for recent versions of numba, what use to be called autojit (i.e. letting numba do type inference on the inputs) and is now just the decorator without type hints is usually just as fast as specifying your input types in my experience.

Also timings were run using numba 0.17.0 on OS X using the Anaconda python distribution with Python 2.7.9.

like image 191
JoshAdel Avatar answered Oct 21 '22 05:10

JoshAdel