Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-trivial sums of outer products without temporaries in numpy

The actual problem I wish to solve is, given a set of N unit vectors and another set of M vectors calculate for each of the unit vectors the average of the absolute value of the dot product of it with every one of the M vectors. Essentially this is calculating the outer product of the two matrices and summing and averaging with an absolute value stuck in-between.

For N and M not too large this is not hard and there are many ways to proceed (see below). The problem is when N and M are large the temporaries created are huge and provide a practical limitation for the provided approach. Can this calculation be done without creating temporaries? The main difficulty I have is due to the presence of the absolute value. Are there general techniques for "threading" such calculations?

As an example consider the following code

N = 7
M = 5

# Create the unit vectors, just so we have some examples,
# this is not meant to be elegant
phi = np.random.rand(N)*2*np.pi
ctheta = np.random.rand(N)*2 - 1
stheta = np.sqrt(1-ctheta**2)
nhat = np.array([stheta*np.cos(phi), stheta*np.sin(phi), ctheta]).T

# Create the other vectors
m = np.random.rand(M,3)

# Calculate the quantity we desire, here using broadcasting.
S = np.average(np.abs(np.sum(nhat*m[:,np.newaxis,:], axis=-1)), axis=0)

This is great, S is now an array of length N and contains the desired results. Unfortunately in the process we have created some potentially huge arrays. The result of

np.sum(nhat*m[:,np.newaxis,:], axis=-1)

is a M X N array. The final result, of course, is only of size N. Start increasing the sizes of N and M and we quickly run into a memory error.

As noted above, if the absolute value were not required then we could proceed as follows, now using einsum()

T = np.einsum('ik,jk,j', nhat, m, np.ones(M)) / M

This works and works quickly even for quite large N and M . For the specific problem I need to include the abs() but a more general solution (perhaps a more general ufunc) would also be of interest.

like image 773
Craig J Copi Avatar asked Jul 12 '13 21:07

Craig J Copi


People also ask

How do you use outer product in numpy?

To get the Outer product of two arrays, use the numpy. outer() method in Python. The 1st parameter a is the first input vector. Input is flattened if not already 1-dimensional.

What does numpy einsum do?

einsum. Evaluates the Einstein summation convention on the operands. Using the Einstein summation convention, many common multi-dimensional, linear algebraic array operations can be represented in a simple fashion.

Is einsum fast?

einsum is clearly faster. Actually, twice as fast as numpy's built-in functions and, well, 6 times faster than loops, in this case.


1 Answers

Based on some of the comments it seems that using cython is the best way to go. I have foolishly never looked into using cython. It turns out to be relatively easy to produce working code.

After some searching I put together the following cython code. This is not the most general code, probably not the best way to write it, and can probably be made more efficient. Even so, it is only about 25% slower than the einsum() code in the original question so it isn't too bad! It has been written to work explicitly with arrays created as done in the original question (hence the assumed modes of the input arrays).
Despite the caveats it does provide a reasonably efficient solution to the original problem and can serve as a starting point in similar situations.

import numpy as np
cimport numpy as np
import cython
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef inline double d_abs (double a) : return a if a >= 0 else -a

@cython.boundscheck(False)
@cython.wraparound(False)
def process_vectors (np.ndarray[DTYPE_t, ndim=2, mode="fortran"] nhat not None,
                     np.ndarray[DTYPE_t, ndim=2, mode="c"] m not None) :
    if nhat.shape[1] != m.shape[1] :
        raise ValueError ("Arrays must contain vectors of the same dimension")
    cdef Py_ssize_t imax = nhat.shape[0]
    cdef Py_ssize_t jmax = m.shape[0]
    cdef Py_ssize_t kmax = nhat.shape[1] # same as m.shape[1]
    cdef np.ndarray[DTYPE_t, ndim=1] S = np.zeros(imax, dtype=DTYPE)
    cdef Py_ssize_t i, j, k
    cdef DTYPE_t val, tmp
    for i in range(imax) :
        val = 0
        for j in range(jmax) :
            tmp = 0
            for k in range(kmax) :
                tmp += nhat[i,k] * m[j,k]
            val += d_abs(tmp)
        S[i] = val / jmax
    return S
like image 69
Craig J Copi Avatar answered Oct 05 '22 06:10

Craig J Copi