Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I vectorize this triple-loop over 2d arrays in numpy?

Can I eliminate all Python loops in this computation:

result[i,j,k] = (x[i] * y[j] * z[k]).sum()

where x[i], y[j], z[k] are vectors of length N and x,y,z have first dimensions with length A,B,C s.t. output is shape (A,B,C) and each element is the sum of a triple-product (element-wise).

I can get it down from 3 to 1 loops (code below), but am stuck trying to eliminate the last loop.

If necessary I could make A=B=C (via small amount of padding).

# Example with 3 loops, 2 loops, 1 loop (testing omitted)

N = 100 # more like 100k in real problem
A =   2 # more like 20 in real problem
B =   3 # more like 20 in real problem
C =   4 # more like 20 in real problem

import numpy
x = numpy.random.rand(A, N)
y = numpy.random.rand(B, N)
z = numpy.random.rand(C, N)

# outputs of each variant
result_slow = numpy.empty((A,B,C))
result_vec_C = numpy.empty((A,B,C))
result_vec_CB = numpy.empty((A,B,C))

# 3 nested loops
for i in range(A):
    for j in range(B):
        for k in range(C):
            result_slow[i,j,k] = (x[i] * y[j] * z[k]).sum()

# vectorize loop over C (2 nested loops)
for i in range(A):
    for j in range(B):
        result_vec_C[i,j,:] = (x[i] * y[j] * z).sum(axis=1)

# vectorize one C and B (one loop)
for i in range(A):
    result_vec_CB[i,:,:] = numpy.dot(x[i] * y, z.transpose())

numpy.testing.assert_almost_equal(result_slow, result_vec_C)
numpy.testing.assert_almost_equal(result_slow, result_vec_CB)
like image 656
Joseph Hastings Avatar asked Jun 29 '12 16:06

Joseph Hastings


People also ask

How does NumPy do vectorization?

The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.

Does NumPy vectorize fast?

Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”

How do you iterate a 2D array in python?

Traversing in a 2D array in python can be done by using a for a loop. We can iterate through the outer array first, and then at each element of the outer array, we have another array, our inner array containing the elements. So for each inner array, we run a loop to traverse its elements.

What is array vectorization?

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.


1 Answers

If you are using numpy > 1.6, there is the awesome np.einsum function:

np.einsum('im,jm,km->ijk',x,y,z)

Which is equivalent to your looped versions. I'm not sure how this will fair on efficiency once you get up to the size of your arrays in the real problem (I'm actually getting a segfault on my machine, when I move to those sizes). The other solution which I often prefer for these sorts of problems is re-writing the method using cython.

like image 52
JoshAdel Avatar answered Sep 29 '22 12:09

JoshAdel