An algorithm I'm working on requires computing, in a couple places, a type of matrix triple product. The operation takes three square matrices with identical dimensions, and produces a 3-index tensor. Labeling the operands <code>A</code>, <code>B</code> and <code>C</code>, the <code>(i,j,k)</code>-th element of the result is <pre class="prettyprint"><code>X[i,j,k] = \sum_a A[i,a] B[a,j] C[k,a] </code></pre> In numpy, you can compute this with <code>einsum('ia,aj,ka->ijk', A, B, C)</code>. Questions: <ul> <li>Does this operation have a standard name?</li> <li>Can I compute this with a single BLAS call?</li> <li>Are there any other heavy-optimized numerical C/Fortran libraries that can compute expressions of this type?</li> </ul>

<h3>Introduction and Solution Code</h3> <code>np.einsum</code>, is really hard to beat, but in rare cases, you can still beat it, if you can bring in <code>matrix-multiplication</code> into the computations. After few trials, it seems you can bring in <code>matrix-multiplication with np.dot</code> to surpass the performance with <code>np.einsum('ia,aj,ka->ijk', A, B, C)</code>. The basic idea is that we break down the "all einsum" operation into a combination of <code>np.einsum</code> and <code>np.dot</code> as listed below: <ul> <li>The summations for <code>A:[i,a]</code> and <code>B:[a,j]</code> are done with <code>np.einsum</code> to get us a <code>3D array:[i,j,a]</code>.</li> <li>This 3D array is then reshaped into a <code>2D array:[i*j,a]</code> and the third array, <code>C[k,a]</code> is transposed to <code>[a,k]</code>, with the intention of performing <code>matrix-multiplication</code> between these two, giving us <code>[i*j,k]</code> as the matrix product, as we lose the index <code>[a]</code> there.</li> <li>The product is reshaped into a <code>3D array:[i,j,k]</code> for the final output.</li> </ul> Here's the implementation for the first version discussed so far - <pre class="prettyprint"><code>import numpy as np def tensor_prod_v1(A,B,C): # First version of proposed method # Shape parameters m,d = A.shape n = B.shape[1] p = C.shape[0] # Calculate \sum_a A[i,a] B[a,j] to get a 3D array with indices as (i,j,a) AB = np.einsum('ia,aj->ija', A, B) # Calculate entire summation losing a-ith index & reshaping to desired shape return np.dot(AB.reshape(m*n,d),C.T).reshape(m,n,p) </code></pre> Since we are summing the <code>a-th</code> index across all three input arrays, one can have three different methods to sum along the a-th index. The code listed earlier was for <code>(A,B)</code>. Thus, we can also have <code>(A,C)</code> and <code>(B,C)</code> giving us two more variations, as listed next: <pre class="prettyprint"><code>def tensor_prod_v2(A,B,C): # Shape parameters m,d = A.shape n = B.shape[1] p = C.shape[0] # Calculate \sum_a A[i,a] C[k,a] to get a 3D array with indices as (i,k,a) AC = np.einsum('ia,ja->ija', A, C) # Calculate entire summation losing a-ith index & reshaping to desired shape return np.dot(AC.reshape(m*p,d),B).reshape(m,p,n).transpose(0,2,1) def tensor_prod_v3(A,B,C): # Shape parameters m,d = A.shape n = B.shape[1] p = C.shape[0] # Calculate \sum_a B[a,j] C[k,a] to get a 3D array with indices as (a,j,k) BC = np.einsum('ai,ja->aij', B, C) # Calculate entire summation losing a-ith index & reshaping to desired shape return np.dot(A,BC.reshape(d,n*p)).reshape(m,n,p) </code></pre> Depending upon the shapes of the input arrays, different approaches would yield different speedups with respect to each other, but we are hopeful that all would be better than the <code>all-einsum</code> approach. The performance numbers are listed in the next section. <h3>Runtime Tests</h3> This is probably the most important section, as we try to look into the speedup numbers with the three variations of the proposed approach over the <code>all-einsum</code> approach as originally proposed in the question. Dataset #1 (Equal shaped arrays) : <pre class="prettyprint"><code>In [494]: L1 = 200 ...: L2 = 200 ...: L3 = 200 ...: al = 200 ...: ...: A = np.random.rand(L1,al) ...: B = np.random.rand(al,L2) ...: C = np.random.rand(L3,al) ...: In [495]: %timeit tensor_prod_v1(A,B,C) ...: %timeit tensor_prod_v2(A,B,C) ...: %timeit tensor_prod_v3(A,B,C) ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C) ...: 1 loops, best of 3: 470 ms per loop 1 loops, best of 3: 391 ms per loop 1 loops, best of 3: 446 ms per loop 1 loops, best of 3: 3.59 s per loop </code></pre> Dataset #2 (Bigger A) : <pre class="prettyprint"><code>In [497]: L1 = 1000 ...: L2 = 100 ...: L3 = 100 ...: al = 100 ...: ...: A = np.random.rand(L1,al) ...: B = np.random.rand(al,L2) ...: C = np.random.rand(L3,al) ...: In [498]: %timeit tensor_prod_v1(A,B,C) ...: %timeit tensor_prod_v2(A,B,C) ...: %timeit tensor_prod_v3(A,B,C) ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C) ...: 1 loops, best of 3: 442 ms per loop 1 loops, best of 3: 355 ms per loop 1 loops, best of 3: 303 ms per loop 1 loops, best of 3: 2.42 s per loop </code></pre> Dataset #3 (Bigger B) : <pre class="prettyprint"><code>In [500]: L1 = 100 ...: L2 = 1000 ...: L3 = 100 ...: al = 100 ...: ...: A = np.random.rand(L1,al) ...: B = np.random.rand(al,L2) ...: C = np.random.rand(L3,al) ...: In [501]: %timeit tensor_prod_v1(A,B,C) ...: %timeit tensor_prod_v2(A,B,C) ...: %timeit tensor_prod_v3(A,B,C) ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C) ...: 1 loops, best of 3: 474 ms per loop 1 loops, best of 3: 247 ms per loop 1 loops, best of 3: 439 ms per loop 1 loops, best of 3: 2.26 s per loop </code></pre> Dataset #4 (Bigger C) : <pre class="prettyprint"><code>In [503]: L1 = 100 ...: L2 = 100 ...: L3 = 1000 ...: al = 100 ...: ...: A = np.random.rand(L1,al) ...: B = np.random.rand(al,L2) ...: C = np.random.rand(L3,al) In [504]: %timeit tensor_prod_v1(A,B,C) ...: %timeit tensor_prod_v2(A,B,C) ...: %timeit tensor_prod_v3(A,B,C) ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C) ...: 1 loops, best of 3: 250 ms per loop 1 loops, best of 3: 358 ms per loop 1 loops, best of 3: 362 ms per loop 1 loops, best of 3: 2.46 s per loop </code></pre> Dataset #5 (Bigger a-th dimension length) : <pre class="prettyprint"><code>In [506]: L1 = 100 ...: L2 = 100 ...: L3 = 100 ...: al = 1000 ...: ...: A = np.random.rand(L1,al) ...: B = np.random.rand(al,L2) ...: C = np.random.rand(L3,al) ...: In [507]: %timeit tensor_prod_v1(A,B,C) ...: %timeit tensor_prod_v2(A,B,C) ...: %timeit tensor_prod_v3(A,B,C) ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C) ...: 1 loops, best of 3: 373 ms per loop 1 loops, best of 3: 269 ms per loop 1 loops, best of 3: 299 ms per loop 1 loops, best of 3: 2.38 s per loop </code></pre> Conclusions: We are seeing a speedup of <code>8x-10x</code> with the variations of the proposed approach over the <code>all-einsum</code> approach listed in the question.

Let <code>n</code>x<code>n</code> be the matrix sizes. In Matlab, you can <ol> <li>Group <code>A</code> and <code>C</code> into a <code>n^2</code>x<code>n</code> matrix <code>AC</code>, such that rows of <code>AC</code> correspond to all combinations of rows of <code>A</code> and <code>C</code>.</li> <li>Post-multiply <code>AC</code> by <code>B</code>. That gives the desired result, only in a different shape.</li> <li>Reshape and permute dimensions to get the result in the desired form.</li> </ol> Code: <pre class="prettyprint lang-matlab prettyprint-override"><code>AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n); % // 1 X = permute(reshape((AC*B).', n, n, n), [2 1 3]); %'// 2, 3 </code></pre> Check with a verbatim loop-based approach: <pre class="prettyprint lang-matlab prettyprint-override"><code>%// Example data: n = 3; A = rand(n,n); B = rand(n,n); C = rand(n,n); %// Proposed approach: AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n); X = permute(reshape((AC*B).', n, n, n), [2 1 3]); %' %// Loop-based approach: Xloop = NaN(n,n,n); %// initiallize for ii = 1:n for jj = 1:n for kk = 1:n Xloop(ii,jj,kk) = sum(A(ii,:).*B(:,jj).'.*C(kk,:)); %' end end end %// Compute maximum relative difference: max(max(max(abs(X./Xloop-1)))) ans = 2.2204e-16 </code></pre> The maximum relative difference is of the order of <code>eps</code>, so the result is correct to within numerical precision.

Matrix/Tensor Triple Product?

Tags:

matrix

numpy

matlab

matrix-multiplication

blas

An algorithm I'm working on requires computing, in a couple places, a type of matrix triple product.

The operation takes three square matrices with identical dimensions, and produces a 3-index tensor. Labeling the operands A, B and C, the (i,j,k)-th element of the result is

X[i,j,k] = \sum_a A[i,a] B[a,j] C[k,a]

In numpy, you can compute this with einsum('ia,aj,ka->ijk', A, B, C).

Questions:

Does this operation have a standard name?
Can I compute this with a single BLAS call?
Are there any other heavy-optimized numerical C/Fortran libraries that can compute expressions of this type?

747

asked May 13 '15 05:05

Robert T. McGibbon

2 Answers

Introduction and Solution Code

np.einsum, is really hard to beat, but in rare cases, you can still beat it, if you can bring in matrix-multiplication into the computations. After few trials, it seems you can bring in matrix-multiplication with np.dot to surpass the performance with np.einsum('ia,aj,ka->ijk', A, B, C).

The basic idea is that we break down the "all einsum" operation into a combination of np.einsum and np.dot as listed below:

The summations for A:[i,a] and B:[a,j] are done with np.einsum to get us a 3D array:[i,j,a].
This 3D array is then reshaped into a 2D array:[i*j,a] and the third array, C[k,a] is transposed to [a,k], with the intention of performing matrix-multiplication between these two, giving us [i*j,k] as the matrix product, as we lose the index [a] there.
The product is reshaped into a 3D array:[i,j,k] for the final output.

Here's the implementation for the first version discussed so far -

import numpy as np

def tensor_prod_v1(A,B,C):   # First version of proposed method
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]
    
    # Calculate \sum_a A[i,a] B[a,j] to get a 3D array with indices as (i,j,a)
    AB = np.einsum('ia,aj->ija', A, B)
    
    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(AB.reshape(m*n,d),C.T).reshape(m,n,p)

Since we are summing the a-th index across all three input arrays, one can have three different methods to sum along the a-th index. The code listed earlier was for (A,B). Thus, we can also have (A,C) and (B,C) giving us two more variations, as listed next:

def tensor_prod_v2(A,B,C):
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]
    
    # Calculate \sum_a A[i,a] C[k,a] to get a 3D array with indices as (i,k,a)
    AC = np.einsum('ia,ja->ija', A, C)
    
    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(AC.reshape(m*p,d),B).reshape(m,p,n).transpose(0,2,1)
    
def tensor_prod_v3(A,B,C):
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]
    
    # Calculate \sum_a B[a,j] C[k,a] to get a 3D array with indices as (a,j,k)
    BC = np.einsum('ai,ja->aij', B, C)
    
    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(A,BC.reshape(d,n*p)).reshape(m,n,p)

Depending upon the shapes of the input arrays, different approaches would yield different speedups with respect to each other, but we are hopeful that all would be better than the all-einsum approach. The performance numbers are listed in the next section.

Runtime Tests

This is probably the most important section, as we try to look into the speedup numbers with the three variations of the proposed approach over the all-einsum approach as originally proposed in the question.

Dataset #1 (Equal shaped arrays) :

In [494]: L1 = 200
     ...: L2 = 200
     ...: L3 = 200
     ...: al = 200
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [495]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 470 ms per loop
1 loops, best of 3: 391 ms per loop
1 loops, best of 3: 446 ms per loop
1 loops, best of 3: 3.59 s per loop

Dataset #2 (Bigger A) :

In [497]: L1 = 1000
     ...: L2 = 100
     ...: L3 = 100
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [498]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 442 ms per loop
1 loops, best of 3: 355 ms per loop
1 loops, best of 3: 303 ms per loop
1 loops, best of 3: 2.42 s per loop

Dataset #3 (Bigger B) :

In [500]: L1 = 100
     ...: L2 = 1000
     ...: L3 = 100
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [501]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 474 ms per loop
1 loops, best of 3: 247 ms per loop
1 loops, best of 3: 439 ms per loop
1 loops, best of 3: 2.26 s per loop

Dataset #4 (Bigger C) :

In [503]: L1 = 100
     ...: L2 = 100
     ...: L3 = 1000
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)

In [504]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 250 ms per loop
1 loops, best of 3: 358 ms per loop
1 loops, best of 3: 362 ms per loop
1 loops, best of 3: 2.46 s per loop

Dataset #5 (Bigger a-th dimension length) :

In [506]: L1 = 100
     ...: L2 = 100
     ...: L3 = 100
     ...: al = 1000
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [507]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 373 ms per loop
1 loops, best of 3: 269 ms per loop
1 loops, best of 3: 299 ms per loop
1 loops, best of 3: 2.38 s per loop

Conclusions: We are seeing a speedup of 8x-10x with the variations of the proposed approach over the all-einsum approach listed in the question.

164

answered Nov 13 '22 02:11

Divakar

Let nxn be the matrix sizes. In Matlab, you can

Group A and C into a n^2xn matrix AC, such that rows of AC correspond to all combinations of rows of A and C.
Post-multiply AC by B. That gives the desired result, only in a different shape.
Reshape and permute dimensions to get the result in the desired form.

Code:

AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n); % // 1
X = permute(reshape((AC*B).', n, n, n), [2 1 3]);                               %'// 2, 3

Check with a verbatim loop-based approach:

%// Example data:
n = 3;
A = rand(n,n);
B = rand(n,n);
C = rand(n,n);

%// Proposed approach:
AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n);
X = permute(reshape((AC*B).', n, n, n), [2 1 3]); %'

%// Loop-based approach:
Xloop = NaN(n,n,n); %// initiallize
for ii = 1:n
    for jj = 1:n
        for kk = 1:n
            Xloop(ii,jj,kk) = sum(A(ii,:).*B(:,jj).'.*C(kk,:)); %'
        end
    end
end

%// Compute maximum relative difference:
max(max(max(abs(X./Xloop-1))))

ans =
    2.2204e-16

The maximum relative difference is of the order of eps, so the result is correct to within numerical precision.

answered Nov 13 '22 03:11

Luis Mendo

Related questions
                            
                                How to overload user defined functions in Matlab?
                            
                                SciPy interp1d results are different than MatLab interp1
                            
                                Is 'Ctrl' key pressed while clicking a button?
                            
                                Large numbers multiplication in MATLAB
                            
                                read text files containing binary data as a single matrix in matlab
                            
                                Convert MATLAB char array to string
                            
                                Do variables contain extra hidden metadata - aka When is zero not zero (but still is)
                            
                                Detect Keyboard Input Matlab
                            
                                MATLAB adding array elements iteratively: time behavior
                            
                                Integrating over a constant function
                            
                                Profiling a mex-function
                            
                                shared library locations for matlab mex files:
                            
                                How to close file handle in matlab?
                            
                                Difference between size_t and mwSize when compiling C MEX-files for Matlab
                            
                                Script to save matlab figures to a specified directory
                            
                                Best alternative to MATLAB's global variables
                            
                                Indexing of unknown dimensional matrix
                            
                                bandpass butterworth filter implementation in C++
                            
                                New line in axis tick labels in Matlab
                            
                                MATLAB ksdensity equivalent in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With