I have a boolean matrix with 1.5E6
rows and 20E3
columns, similar to this example:
M = [[ True, True, False, True, ...],
[False, True, True, True, ...],
[False, False, False, False, ...],
[False, True, False, False, ...],
...
[ True, True, False, False, ...]
]
Also, I have another matrix N
( 1.5E6
rows, 1
column):
N = [[ True],
[False],
[ True],
[ True],
...
[ True]
]
What I need to do, is to go through each column pair from matrix M
(1&1, 1&2, 1&3, 1&N, 2&1, 2&2 etc) combined by the AND
operator, and count how many overlaps there are between the result and matrix N
.
My Python/Numpy code would look like this:
for i in range(M.shape[1]):
for j in range(M.shape[1]):
result = M[:,i] & M[:,j] # Combine the columns with AND operator
count = np.sum(result & N.ravel()) # Counts the True occurrences
... # Save the count for the i and j pair
The problem is, going through 20E3 x 20E3
combinations with two for loops is computationally expensive (takes around 5-10 days to compute). A better option I tried is comparing each column to the whole matrix M:
for i in range(M.shape[1]):
result = M[:,i]*M.shape[1] & M # np.tile or np.repeat is used to horizontally repeat the column
counts = np.sum(result & N*M.shape[1], axis=0)
... # Save the counts
This reduces overhead and calculation time to around 10%, but it's still taking 1 day or so to compute.
My question would be :
what is the fastest way (non Python maybe?) to make these calculations (basically just AND
and SUM
)?
I was thinking about low level languages, GPU processing, quantum computing etc.. but I don't know much about any of these so any advice regarding the direction is appreciated!
Additional thoughts: Currently thinking if there is a fast way using the dot product (as Davikar proposed) for computing triplets of combinations:
def compute(M, N):
out = np.zeros((M.shape[1], M.shape[1], M.shape[1]), np.int32)
for i in range(M.shape[1]):
for j in range(M.shape[1]):
for k in range(M.shape[1]):
result = M[:, i] & M[:, j] & M[:, k]
out[i, j, k] = np.sum(result & N.ravel())
return out
Simply use np.einsum
to get all the counts -
np.einsum('ij,ik,i->jk',M,M.astype(int),N.ravel())
Feel free to play around with optimize
flag with np.einsum
. Also, feel free to play around with different dtypes conversion.
To leverage GPU, we can use tensorflow
package that also supports einsum
.
Faster alternatives with np.dot
:
(M&N).T.dot(M.astype(int))
(M&N).T.dot(M.astype(np.float32))
Timings -
In [110]: np.random.seed(0)
...: M = np.random.rand(500,300)>0.5
...: N = np.random.rand(500,1)>0.5
In [111]: %timeit np.einsum('ij,ik,i->jk',M,M.astype(int),N.ravel())
...: %timeit (M&N).T.dot(M.astype(int))
...: %timeit (M&N).T.dot(M.astype(np.float32))
227 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
66.8 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
3.26 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And take it a bit further with float32 conversions for both of the boolean arrays -
In [122]: %%timeit
...: p1 = (M&N).astype(np.float32)
...: p2 = M.astype(np.float32)
...: out = p1.T.dot(p2)
2.7 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With