Non-trivial sums of outer products without temporaries in numpy

Tags:

The actual problem I wish to solve is, given a set of N unit vectors and another set of M vectors calculate for each of the unit vectors the average of the absolute value of the dot product of it with every one of the M vectors. Essentially this is calculating the outer product of the two matrices and summing and averaging with an absolute value stuck in-between.

For N and M not too large this is not hard and there are many ways to proceed (see below). The problem is when N and M are large the temporaries created are huge and provide a practical limitation for the provided approach. Can this calculation be done without creating temporaries? The main difficulty I have is due to the presence of the absolute value. Are there general techniques for "threading" such calculations?

As an example consider the following code

N = 7
M = 5

# Create the unit vectors, just so we have some examples,
# this is not meant to be elegant
phi = np.random.rand(N)*2*np.pi
ctheta = np.random.rand(N)*2 - 1
stheta = np.sqrt(1-ctheta**2)
nhat = np.array([stheta*np.cos(phi), stheta*np.sin(phi), ctheta]).T

# Create the other vectors
m = np.random.rand(M,3)

# Calculate the quantity we desire, here using broadcasting.
S = np.average(np.abs(np.sum(nhat*m[:,np.newaxis,:], axis=-1)), axis=0)

This is great, S is now an array of length N and contains the desired results. Unfortunately in the process we have created some potentially huge arrays. The result of

np.sum(nhat*m[:,np.newaxis,:], axis=-1)

is a M X N array. The final result, of course, is only of size N. Start increasing the sizes of N and M and we quickly run into a memory error.

As noted above, if the absolute value were not required then we could proceed as follows, now using einsum()

T = np.einsum('ik,jk,j', nhat, m, np.ones(M)) / M

This works and works quickly even for quite large N and M . For the specific problem I need to include the abs() but a more general solution (perhaps a more general ufunc) would also be of interest.

773

asked Jul 12 '13 21:07

Craig J Copi

1 Answers

Based on some of the comments it seems that using cython is the best way to go. I have foolishly never looked into using cython. It turns out to be relatively easy to produce working code.

After some searching I put together the following cython code. This is not the most general code, probably not the best way to write it, and can probably be made more efficient. Even so, it is only about 25% slower than the einsum() code in the original question so it isn't too bad! It has been written to work explicitly with arrays created as done in the original question (hence the assumed modes of the input arrays).
Despite the caveats it does provide a reasonably efficient solution to the original problem and can serve as a starting point in similar situations.

import numpy as np
cimport numpy as np
import cython
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef inline double d_abs (double a) : return a if a >= 0 else -a

@cython.boundscheck(False)
@cython.wraparound(False)
def process_vectors (np.ndarray[DTYPE_t, ndim=2, mode="fortran"] nhat not None,
                     np.ndarray[DTYPE_t, ndim=2, mode="c"] m not None) :
    if nhat.shape[1] != m.shape[1] :
        raise ValueError ("Arrays must contain vectors of the same dimension")
    cdef Py_ssize_t imax = nhat.shape[0]
    cdef Py_ssize_t jmax = m.shape[0]
    cdef Py_ssize_t kmax = nhat.shape[1] # same as m.shape[1]
    cdef np.ndarray[DTYPE_t, ndim=1] S = np.zeros(imax, dtype=DTYPE)
    cdef Py_ssize_t i, j, k
    cdef DTYPE_t val, tmp
    for i in range(imax) :
        val = 0
        for j in range(jmax) :
            tmp = 0
            for k in range(kmax) :
                tmp += nhat[i,k] * m[j,k]
            val += d_abs(tmp)
        S[i] = val / jmax
    return S

answered Oct 05 '22 06:10

Craig J Copi

Related questions
                            
                                Querying numerical OID in PySNMP?
                            
                                How can I perform a 2-way repeated measures anova using statsmodels?
                            
                                Django: order a queryset by the sum of annotated fields?
                            
                                How to build a GUI application with Python 3.3 and Qt 5? [closed]
                            
                                Specifying dictionary argument with dict() or braces matters in set_bbox
                            
                                Getting output of pybrain prediction as array
                            
                                couldn't remove origin point in matplotlib polycollection
                            
                                Global name in Python
                            
                                Is chronological order of logging messages guaranteed?
                            
                                Python: Generating All Pairwise-Unique Pairings
                            
                                socket-based simple python chat program
                            
                                Parsing Python function calls to get argument positions
                            
                                Kivy: How to make widget behave like overflow:hidden
                            
                                Scipy Sparse Matrix - Dense Vector Multiplication Performance - Blocks vs Large Matrix
                            
                                How do I use a python static checker with Eclipse?
                            
                                How to customize module name for Sphinx
                            
                                How can I copy from an html file to the clipboard in Python in formatted text?
                            
                                On Mac, how to create a drag & drop app for a python script where the script name rather than Python shows in the MenuBar (etc.)?
                            
                                Using factory_boy with SQLAlchemy and class methods
                            
                                Problems with autodoc and explicitly specified instance attributes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Non-trivial sums of outer products without temporaries in numpy

Tags:

python

optimization

numpy

Craig J Copi

People also ask

1 Answers

Craig J Copi

Recent Activity

Donate For Us