Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does `numpy.einsum` work?

The correct way of writing a summation in terms of Einstein summation is a puzzle to me, so I want to try it in my code. I have succeeded in a few cases but mostly with trial and error.

Now there is a case that I cannot figure out. First, a basic question. For two matrices A and B that are Nx1 and 1xN, respectively, AB is NxN but BA is 1x1. When I want to calculate the NxN case with np.einsum I can do:

import numpy as np

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ji->ij', a, b)

and the final array is 2x2. However

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ij->ij', a, b)

returns a 1x2 array. I don't quite understand why this does not give the correct result. For example for the above case numpy's guide says that arrows can be used to force summation or stop it from taking place. But that seems quite vague to me; in the above case I don't understand how numpy decides about the final size of the output array based on the order of indices (which apparently changes).

Formally I know the following: When there is nothing on the right side of the arrow, one can write the summation mathematically as $\sum\limits_{i=0}^{N}\sum\limits_{j=0}^{M} A_{ij}B_{ij}$ for np.einsum('ij,ij',A,B), but when there is an arrow I am clueless how to interpret it in terms of a formal mathematical expression.

like image 708
Cupitor Avatar asked Dec 11 '22 03:12

Cupitor


1 Answers

In [22]: a
Out[22]: array([[1, 2]])
In [23]: b
Out[23]: array([[2, 3]])
In [24]: np.einsum('ij,ij->ij',a,b)
Out[24]: array([[2, 6]])
In [29]: a*b
Out[29]: array([[2, 6]])

Here the repetition of the indices in all parts, including output, is interpreted as element by element multiplication. Nothing is summed. a[i,j]*b[i,j] = c[i,j] for all i,j.

In [25]: np.einsum('ij,ji->ij',a,b)
Out[25]: 
array([[2, 4],
       [3, 6]])
In [28]: np.dot(a.T,b).T
Out[28]: 
array([[2, 4],
       [3, 6]])
In [38]: np.outer(a,b)
Out[38]: 
array([[2, 3],
       [4, 6]])

Again no summation because the same indices appear on left and right sides. a[i,j]*b[j,i] = c[i,j], in other words:

[[1*2, 2*2],
 [1*3, 2*3]]

In effect an outer product. A look at how a is broadcasted against b.T might help:

In [69]: np.broadcast_arrays(a,b.T)
Out[69]: 
[array([[1, 2],
        [1, 2]]), 
 array([[2, 2],
        [3, 3]])]

On the left side of the statement, repeated indices indicate which dimensions are multiplied. Matching left and right sides determines whether they are summed or not.

np.einsum('ij,ji->j',a,b) # array([ 5, 10]) sum on i only
np.einsum('ij,ji->i',a,b) # array([ 5, 10]) sum on j only
np.einsum('ij,ji',a,b) # 15 sum on i and j

A while back I worked out a pure Python equivalent to einsum, with most of focus on how it parsed the string. The goal is the create an nditer with which it does a sum of products calculation. But it's not a trivial script to follow, even in Python:

https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py


A simpler sequence showing these summation rules:

In [53]: c=np.array([[1,2],[3,4]])
In [55]: np.einsum('ij',c)
Out[55]: 
array([[1, 2],
       [3, 4]])
In [56]: np.einsum('ij->i',c)
Out[56]: array([3, 7])
In [57]: np.einsum('ij->j',c)
Out[57]: array([4, 6])
In [58]: np.einsum('ij->',c)
Out[58]: 10

Using arrays that don't have a 1 dimension removes the broadcasting complication:

In [71]: b2=np.arange(1,7).reshape(2,3)
In [72]: np.einsum('ij,ji',a2,b2)
...
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,3)->(2,3) (2,3)->(3,2) 

Or should I say, it exposes the attempted broadcasting.

Ellipsis adds a level of complexity to the einsum interpretation. I developed the above mentioned github code when I solved a bug in the uses of .... But I didn't put much effort into refining the documentation.

Ellipsis broadcasting in numpy.einsum


The ellipses are most useful when you want an expression that can handle various sizes of arrays. If your arrays always 2D, it doesn't do anything extra.

By way of example, consider a generalization of the dot, one that multiplies the last dimension of A with the 2nd to the last of B. With ellipsis we can write an expression that can handle a mix of 2d, 3D and larger arrays:

np.einsum('...ij,...jk',np.ones((2,3)),np.ones((3,4)))  # (2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((3,4)))  # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((5,3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((7,5,3,4)))  # (7,5,2,4)
np.einsum('...ij,...jk->...ik',np.ones((5,2,3)),np.ones((7,5,3,4)) # (7, 5, 2, 4)

The last expression uses the default right hand side indexing ...ik, ellipsis plus the non-summing indices.

Your original example could be written as

np.einsum('...j,j...->...j',a,b)

Effectively it fills in the i (or more dimensions) to match the dimensions of the arrays.

which would also work if a or b was 1d:

np.einsum('...j,j...->...j',a,b[0,:])

np.dot way of generalizing to larger dimensions is different

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

is expressed in einsum as:

np.einsum('ijo,kom->ijkm',np.ones((2,3,4)),np.ones((3,4,2)))

which can be generalized with

np.einsum('...o,kom->...km',np.ones((4,)),np.ones((3,4,2)))

or

np.einsum('ijo,...om->ij...m',np.ones((2,3,4)),np.ones((3,4,2)))

But I don't think I can completely replicate it in einsum. That is, I can't tell it to fill in indices for A, followed by different ones for B.

like image 128
hpaulj Avatar answered Jan 03 '23 00:01

hpaulj