I have two ndarrays with shapes:
A = (32,512,640)
B = (4,512)
I need to multiply A and B such that I get a new ndarray:
C = (4,32,512,640)
Another way to think of it is that each row of vector B is multiplied along axis=-2 of A, which results in a new 1,32,512,640 cube. Each row of B can be looped over forming 1,32,512,640 cubes, which can then be used to build C up by using np.concatenate
or np.vstack
, such as:
# Sample inputs, where the dimensions aren't necessarily known
a = np.arange(32*512*465, dtype='f4').reshape((32,512,465))
b = np.ones((4,512), dtype='f4')
# Using a loop
d = []
for row in b:
d.append(np.expand_dims(row[None,:,None]*a, axis=0))
# Or using list comprehension
d = [np.expand_dims(row[None,:,None]*a,axis=0) for row in b]
# Stacking the final list
result = np.vstack(d)
But I am wondering if it's possible to use something like np.einsum
or np.tensordot
to get this vectorized all in one line. I'm still learning how to use those two methods, so I'm not sure if it's appropriate here.
Thanks!
Two types of multiplication involving two vectors are defined: the so-called scalar product (or "dot product") and the so-called vector product (or "cross product").
matmul and both outperform np. dot .
First, multiply Row 1 of the matrix by Column 1 of the vector. Next, multiply Row 2 of the matrix by Column 1 of the vector. Finally multiply Row 3 of the matrix by Column 1 of the vector.
To multiply a vector by a scalar, multiply each component by the scalar. If →u=⟨u1,u2⟩ has a magnitude |→u| and direction d , then n→u=n⟨u1,u2⟩=⟨nu1,nu2⟩ where n is a positive real number, the magnitude is |n→u| , and its direction is d .
We can leverage broadcasting
after extending the dimensions of B
with None/np.newaxis
-
C = A * B[:,None,:,None]
With einsum
, it would be -
C = np.einsum('ijk,lj->lijk',A,B)
There's no sum-reduction happening here, so einsum
won't be any better than the explicit-broadcasting
one. But since, we are looking for Pythonic solution, that could be used, once we get past its string notation.
Let's get some timings to finish things off -
In [15]: m,n,r,p = 32,512,640,4
...: A = np.random.rand(m,n,r)
...: B = np.random.rand(p,n)
In [16]: %timeit A * B[:,None,:,None]
10 loops, best of 3: 80.9 ms per loop
In [17]: %timeit np.einsum('ijk,lj->lijk',A,B)
10 loops, best of 3: 109 ms per loop
# Original soln
In [18]: %%timeit
...: d = []
...: for row in B:
...: d.append(np.expand_dims(row[None,:,None]*A, axis=0))
...:
...: result = np.vstack(d)
10 loops, best of 3: 130 ms per loop
Leverage multi-core
We could leverage multi-core capability of numexpr
, which is suited for arithmetic operations
and large data
and thus gain some performance boost here. Let's time with it -
In [42]: import numexpr as ne
In [43]: B4D = B[:,None,:,None] # this is virtually free
In [44]: %timeit ne.evaluate('A*B4D')
10 loops, best of 3: 64.6 ms per loop
In one-line as : ne.evaluate('A*B4D',{'A':A,'B4D' :B[:,None,:,None]})
.
Related post
on how to control multi-core functionality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With