Assume I have matrices P with the size [4, 4]
which partitioned (block) into 4 smaller matrices [2,2]
. How can I efficiently multiply this block-matrix into another matrix (not partitioned matrix but smaller)?
Let's Assume our original matric is:
P = [ 1 1 2 2
1 1 2 2
3 3 4 4
3 3 4 4]
Which split into submatrices:
P_1 = [1 1 , P_2 = [2 2 , P_3 = [3 3 P_4 = [4 4
1 1] 2 2] 3 3] 4 4]
Now our P is:
P = [P_1 P_2
P_3 p_4]
In the next step, I want to do element-wise multiplication between P and smaller matrices which its size is equal to number of sub-matrices:
P * [ 1 0 = [P_1 0 = [1 1 0 0
0 0 ] 0 0] 1 1 0 0
0 0 0 0
0 0 0 0]
You can think of representing your large block matrix in a more efficient way.
For instance, a block matrix
P = [ 1 1 2 2
1 1 2 2
3 3 4 4
3 3 4 4]
Can be represented using
a = [ 1 0 b = [ 1 1 0 0 p = [ 1 2
1 0 0 0 1 1 ] 3 4 ]
0 1
0 1 ]
As
P = a @ p @ b
With (@
representing matrix multiplication). Matrices a
and b
represents/encode the block structure of P
and the small p
represents the values of each block.
Now, if you want to multiply (element-wise) p
with a small (2x2) matrix q
you simply
a @ (p * q) @ b
A simple pytorch example
In [1]: a = torch.tensor([[1., 0], [1., 0], [0., 1], [0, 1]])
In [2]: b = torch.tensor([[1., 1., 0, 0], [0, 0, 1., 1]])
In [3]: p=torch.tensor([[1., 2.], [3., 4.]])
In [4]: q = torch.tensor([[1., 0], [0., 0]])
In [5]: a @ p @ b
Out[5]: tensor([[1., 1., 2., 2.], [1., 1., 2., 2.], [3., 3., 4., 4.], [3., 3., 4., 4.]])
In [6]: a @ (p*q) @ b
Out[6]: tensor([[1., 1., 0., 0.], [1., 1., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
I leave it to you as an exercise how to efficiently produce the "structure" matrices a
and b
given the sizes of the blocks.
Following is a general Tensorflow-based solution that works for input matrices p
(large) and m
(small) of arbitrary shapes as long as the sizes of p
are divisible by the sizes of m
on both axes.
def block_mul(p, m):
p_x, p_y = p.shape
m_x, m_y = m.shape
m_4d = tf.reshape(m, (m_x, 1, m_y, 1))
m_broadcasted = tf.broadcast_to(m_4d, (m_x, p_x // m_x, m_y, p_y // m_y))
mp = tf.reshape(m_broadcasted, (p_x, p_y))
return p * mp
Test:
import tensorflow as tf
tf.enable_eager_execution()
p = tf.reshape(tf.constant(range(36)), (6, 6))
m = tf.reshape(tf.constant(range(9)), (3, 3))
print(f"p:\n{p}\n")
print(f"m:\n{m}\n")
print(f"block_mul(p, m):\n{block_mul(p, m)}")
Output (Python 3.7.3, Tensorflow 1.13.1):
p:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
m:
[[0 1 2]
[3 4 5]
[6 7 8]]
block_mul(p, m):
[[ 0 0 2 3 8 10]
[ 0 0 8 9 20 22]
[ 36 39 56 60 80 85]
[ 54 57 80 84 110 115]
[144 150 182 189 224 232]
[180 186 224 231 272 280]]
Another solution that uses implicit broadcasting is the following:
def block_mul2(p, m):
p_x, p_y = p.shape
m_x, m_y = m.shape
p_4d = tf.reshape(p, (m_x, p_x // m_x, m_y, p_y // m_y))
m_4d = tf.reshape(m, (m_x, 1, m_y, 1))
return tf.reshape(p_4d * m_4d, (p_x, p_y))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With