Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pairwise displacement vectors among set of points

I have an array of N points in d dimensions (N, d) and I'd like to make a new array of all the displacement vectors for each pair (N choose 2, d). If I just want the magnitude of these vectors, I can use pdist from scipy.spatial.distance.

It would be great if I could just do

pdist(points, lambda u, v: u - v)

but the metric function must return a scalar (ValueError: setting an array element with a sequence.)


My solution is to use np.triu_indices:

i, j = np.triu_indices(len(points), 1)
displacements = points[i] - points[j]

This is about 20-30 times slower than using pdist (I compare by taking the the magnitude of displacements, though this is not the time-consuming part, which I presume is actually making the upper triangle and running fancy indexing).

like image 456
askewchan Avatar asked Mar 13 '14 20:03

askewchan


2 Answers

Straight forward would be

dis_vectors = [l - r for l, r in itertools.combinations(points, 2)]

but I doubt that it is fast. Actually %timeit says:

For 3 points:

list : 13 us
pdist: 24 us

But already for 27 points:

list : 798 us
pdist: 35.2 us

About how many points are we talking here?

Another possibility something like

import numpy
from operator import mul
from fractions import Fraction

def binomial_coefficient(n,k):
    # credit to http://stackoverflow.com/users/226086/nas-banov
    return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )

def pairwise_displacements(a):
    n = a.shape[0]
    d = a.shape[1]
    c = binomial_coefficient(n, 2)

    out = numpy.zeros( (c, d) )

    l = 0
    r = l + n - 1
    for sl in range(1, n): # no point1 - point1!
        out[l:r] = a[:n-sl] - a[sl:]
        l = r
        r += n - (sl + 1)
    return out

This simply "slides" the array against itself over all dimensions and performs a (broadcastable) subtraction in each step. Note that no repetition is considered and no equal pairs (e.g. point1 - point1).

This function still performs well in the 1000 points range with 31.3ms, whereas pdist is still faster with 20.7 ms and the list comprehension takes the third place with 1.23 s.

like image 132
loli Avatar answered Sep 27 '22 17:09

loli


If you compute the full cartesian product of differences, flatten the resulting 2D array, and create your own indices to extract the upper triangle, you can get it to be "only" 6x slower than pdist:

In [39]: points = np.random.rand(1000, 2)

In [40]: %timeit pdist(points)
100 loops, best of 3: 5.81 ms per loop

In [41]: %%timeit
    ...: n = len(points)
    ...: rng = np.arange(1, n)
    ...: idx = np.arange(n *(n-1) // 2) + np.repeat(np.cumsum(rng), rng[::-1])
    ...: np.take((points[:, None] - points).reshape(-1, 2), idx, axis=0)
    ...: 
10 loops, best of 3: 33.9 ms per loop

You can also speed up your solution, creating the indices yourself, and using take instead of fancy indexing:

In [75]: %%timeit
    ...: n = len(points)
    ...: rng = np.arange(1, n)
    ...: idx1 = np.repeat(rng - 1, rng[::-1])
    ...: idx2 = np.arange(n*(n-1)//2) + np.repeat(n - np.cumsum(rng[::-1]), rng[::-1])
    ...: np.take(points, idx1, axis=0) - np.take(points, idx2, axis=0)
    ...: 
10 loops, best of 3: 38.8 ms per loop
like image 21
Jaime Avatar answered Sep 27 '22 16:09

Jaime