I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:
import numpy as np
C = np.dot(A, B)
However, I happen to know that for matrix B only row n and column n are non-zero (this comes directly from the analytical formula that produced the matrix and is without a doubt always the case).
Hoping to take advantage of this fact and reduce the number of multiplications needed to produce C, I replaced the above with:
import numpy as np
for row in range(0, N):
for col in range(0, N):
if col != n:
C[row, col] = A[row, n]*B[n, col] #Just one scalar multiplication
else:
C[row, col] = np.dot(A[row], B[:, n])
Analytically, this should reduce the total complexity as follows: In the general case (not using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O(N^3). That is, all N rows must multiply all N columns, and each of these dot products contains N multiplications => O(NNN) = O(N^3).#
Exploiting the structure of B as I've done above however should go as O(N^2 + N^2) = O(2N^2) = O(N^2). That is, All N rows must multiply all N columns, however, for all of these (except those involving 'B[:, n]') only one scalar multiplication is required: only one element of 'B[:, m]' is non-zero for m != n. When n == m, which will occur N times (once for each row of A that must multiply column n of B), N scalar multiplications must occur.#
However, the first block of code (using np.dot(A, B)) is substantially faster. I'm aware (via information like: Why is matrix multiplication faster with numpy than with ctypes in Python?) that the low level implementation details of np.dot are likely to blame for this. So my question is this: How can I exploit the structure of matrix B to improve multiplication efficiency without sacrificing the implementation efficiency of NumPy, without building my own low level matrix multiplication in c?
This method is part of a numerical optimization over many variables, hence, O(N^3) is intractable whereas O(N^2) will likely get the job done.
Thank you for any help. Also, I'm new to SO, so please pardon any newbie errors.
If I understood A
and B
correctly, then I do not understand the for
loops and why you are not just multiplying by the two non-zero vectors:
# say A & B are like this:
n, N = 3, 5
A = np.array( np.random.randn(N, N ) )
B = np.zeros_like( A )
B[ n ] = np.random.randn( N )
B[:, n] = np.random.randn( N )
take the non-zero row and column of B:
rowb, colb = B[n,:], np.copy( B[:,n] )
colb[ n ] = 0
multiply A
by those two vector:
X = np.outer( A[:,n], rowb )
X[:,n] += np.dot( A, colb )
to verify check:
X - np.dot( A, B )
with N=100
:
%timeit np.dot(A, B)
1000 loops, best of 3: 1.39 ms per loop
%timeit colb = np.copy( B[:,n] ); colb[ n ] = 0; X = np.outer( A[:,n], B[n,:] ); X[:,n] += np.dot( A, colb )
10000 loops, best of 3: 98.5 µs per loop
I timed it, and using sparse
is faster:
import numpy as np
from scipy import sparse
from timeit import timeit
A = np.random.rand(100,100)
B = np.zeros(A.shape, dtype=np.float)
B[3] = np.random.rand(100)
B[:,3] = np.random.rand(100)
sparse_B = sparse.csr_matrix(B)
n = 1000
t1 = timeit('np.dot(A, B)', 'from __main__ import np, A, B', number=n)
print 'dense way : {}'.format(t1)
t2 = timeit('A * sparse_B', 'from __main__ import A, sparse_B',number=n)
print 'sparse way : {}'.format(t2)
Result:
dense way : 1.15117192268
sparse way : 0.113152980804
>>> np.allclose(np.dot(A, B), A * sparse_B)
True
As number of rows of B
increases, so should the time advantage of multiplication using sparse matrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With