The correct way of writing a summation in terms of Einstein summation is a puzzle to me, so I want to try it in my code. I have succeeded in a few cases but mostly with trial and error.
Now there is a case that I cannot figure out. First, a basic question. For two matrices A
and B
that are Nx1
and 1xN
, respectively, AB
is NxN
but BA
is 1x1
. When I want to calculate the NxN
case with np.einsum
I can do:
import numpy as np
a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ji->ij', a, b)
and the final array is 2x2
. However
a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ij->ij', a, b)
returns a 1x2
array. I don't quite understand why this does not give the correct result.
For example for the above case numpy
's guide says that arrows can be used to force summation or stop it from taking place. But that seems quite vague to me; in the above case I don't understand how numpy decides about the final size of the output array based on the order of indices (which apparently changes).
Formally I know the following: When there is nothing on the right side of the arrow, one can write the summation mathematically as $\sum\limits_{i=0}^{N}\sum\limits_{j=0}^{M} A_{ij}B_{ij}$
for np.einsum('ij,ij',A,B)
, but when there is an arrow I am clueless how to interpret it in terms of a formal mathematical expression.
In [22]: a
Out[22]: array([[1, 2]])
In [23]: b
Out[23]: array([[2, 3]])
In [24]: np.einsum('ij,ij->ij',a,b)
Out[24]: array([[2, 6]])
In [29]: a*b
Out[29]: array([[2, 6]])
Here the repetition of the indices in all parts, including output, is interpreted as element by element multiplication. Nothing is summed. a[i,j]*b[i,j] = c[i,j]
for all i,j
.
In [25]: np.einsum('ij,ji->ij',a,b)
Out[25]:
array([[2, 4],
[3, 6]])
In [28]: np.dot(a.T,b).T
Out[28]:
array([[2, 4],
[3, 6]])
In [38]: np.outer(a,b)
Out[38]:
array([[2, 3],
[4, 6]])
Again no summation because the same indices appear on left and right sides. a[i,j]*b[j,i] = c[i,j]
, in other words:
[[1*2, 2*2],
[1*3, 2*3]]
In effect an outer product. A look at how a
is broadcasted against b.T
might help:
In [69]: np.broadcast_arrays(a,b.T)
Out[69]:
[array([[1, 2],
[1, 2]]),
array([[2, 2],
[3, 3]])]
On the left side of the statement, repeated indices indicate which dimensions are multiplied. Matching left and right sides determines whether they are summed or not.
np.einsum('ij,ji->j',a,b) # array([ 5, 10]) sum on i only
np.einsum('ij,ji->i',a,b) # array([ 5, 10]) sum on j only
np.einsum('ij,ji',a,b) # 15 sum on i and j
A while back I worked out a pure Python equivalent to einsum
, with most of focus on how it parsed the string. The goal is the create an nditer
with which it does a sum of products calculation. But it's not a trivial script to follow, even in Python:
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py
A simpler sequence showing these summation rules:
In [53]: c=np.array([[1,2],[3,4]])
In [55]: np.einsum('ij',c)
Out[55]:
array([[1, 2],
[3, 4]])
In [56]: np.einsum('ij->i',c)
Out[56]: array([3, 7])
In [57]: np.einsum('ij->j',c)
Out[57]: array([4, 6])
In [58]: np.einsum('ij->',c)
Out[58]: 10
Using arrays that don't have a 1
dimension removes the broadcasting complication:
In [71]: b2=np.arange(1,7).reshape(2,3)
In [72]: np.einsum('ij,ji',a2,b2)
...
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,3)->(2,3) (2,3)->(3,2)
Or should I say, it exposes the attempted broadcasting.
Ellipsis adds a level of complexity to the einsum interpretation. I developed the above mentioned github code when I solved a bug in the uses of ...
. But I didn't put much effort into refining the documentation.
Ellipsis broadcasting in numpy.einsum
The ellipses are most useful when you want an expression that can handle various sizes of arrays. If your arrays always 2D, it doesn't do anything extra.
By way of example, consider a generalization of the dot
, one that multiplies the last dimension of A
with the 2nd to the last of B
. With ellipsis we can write an expression that can handle a mix of 2d, 3D and larger arrays:
np.einsum('...ij,...jk',np.ones((2,3)),np.ones((3,4))) # (2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((5,3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((7,5,3,4))) # (7,5,2,4)
np.einsum('...ij,...jk->...ik',np.ones((5,2,3)),np.ones((7,5,3,4)) # (7, 5, 2, 4)
The last expression uses the default right hand side indexing ...ik
, ellipsis plus the non-summing indices.
Your original example could be written as
np.einsum('...j,j...->...j',a,b)
Effectively it fills in the i
(or more dimensions) to match the dimensions of the arrays.
which would also work if a
or b
was 1d:
np.einsum('...j,j...->...j',a,b[0,:])
np.dot
way of generalizing to larger dimensions is different
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
is expressed in einsum as:
np.einsum('ijo,kom->ijkm',np.ones((2,3,4)),np.ones((3,4,2)))
which can be generalized with
np.einsum('...o,kom->...km',np.ones((4,)),np.ones((3,4,2)))
or
np.einsum('ijo,...om->ij...m',np.ones((2,3,4)),np.ones((3,4,2)))
But I don't think I can completely replicate it in einsum
. That is, I can't tell it to fill in indices for A
, followed by different ones for B
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With