Possible optimizations for calculating squared euclidean distance

Question

I need to do a few hundred million euclidean distance calculations every day in a Python project.

Here is what I started out with:

def euclidean_dist_square(x, y):
    diff = np.array(x) - np.array(y)
    return np.dot(diff, diff)

This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.

#include "euclidean.h"
#include <math.h>

double euclidean(double x[128], double y[128])
{
    double Sum;
    for(int i=0;i<128;i++)
    {
        Sum = Sum + pow((x[i]-y[i]),2.0);
    }
    return Sum;
}

Complete code for the extension is here: https://gist.github.com/herrbuerger/bd63b73f3c5cf1cd51de

Now this gives a nice speedup in comparison to the numpy version.

But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.

Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option :(

Thanks.

Edit

I have posted this question on CodeReview: https://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

I will delete this question in an hour in case someone has started to write an answer.

Fred Foo · Accepted Answer

The fastest way to compute Euclidean distances in NumPy that I know is the one in scikit-learn, which can be summed up as

def squared_distances(X, Y):
    """Return a distance matrix for each pair of rows i, j in X, Y."""
    # http://stackoverflow.com/a/19094808/166749
    X_row_norms = np.einsum('ij,ij->i', X, X)
    Y_row_norms = np.einsum('ij,ij->i', Y, Y)
    distances = np.dot(X, Y)
    distances *= -2
    distances += X_row_norms
    distances += Y_row_norms

    np.maximum(distances, 0, distances)  # get rid of negatives; optional
    return distances

The bottleneck in this piece of code is matrix multiplication (np.dot), so make sure your NumPy is linked against a good BLAS implementation; with a multithreaded BLAS on a multicore machine and large enough input matrices, it should be faster than anything you can whip up in C. Note that it relies on the binomial formula

||x - y||² = ||x||² + ||y||² - 2 x⋅y

and that either X_row_norms or Y_row_norms can be cached across invocations for the k-NN use case.

(I'm a coauthor of this code, and I spent quite some time optimizing both it and the SciPy implementation; scikit-learn is faster at the expense of some accuracy, but for k-NN that shouldn't matter too much. The SciPy implementation, available in scipy.spatial.distance, is actually an optimized version of the code you just wrote and is more accurate.)

Possible optimizations for calculating squared euclidean distance

Tags:

performance

python

c

numpy

herrherr

1 Answers

Fred Foo

Recent Activity

Donate For Us

Possible optimizations for calculating squared euclidean distance

Tags:

performance

python

c

numpy

herrherr

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us