I'm interested in the best/fastest way to do array ops (dot, outer, add, etc.) while ignoring some values in the array. I'm mostly interested in cases where some (maybe 50%-30%) of the values are ignored and are effectively zero with moderately large arrays, maybe 100,000 to 1,000,000 elements. There are a number of solutions I can think of but none seem to really benefit from the possible advantages of being able to ignore some values. For example:
import numpy as np
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A
# Option 1: zero some values using multiplication.
# some initial tests show this is the fastest
A += B * D
# Option 2: use indexing
# this seems to be the slowest
A[C] += D[C]
# Option 3: use masked arrays
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
A += D
edit1:
As suggested by cyborg, sparse arrays may be another option. Unfortunately I'm not very familiar with the package and am unable to get the speed advantages that I might be able to. For example if I have a weighted graph with restricted connectivity defined by a sparse matrix A
, another sparse matrix B
which defines the connectivity (1 = connected, 0 = not connected), and a dense numpy matrix C
, I'd like to be able to do something like A = A + B.multiply(C)
and take advantage of A
and B
being sparse.
With a sparse matrix you can get improvement if the density is less than 10%. A sparse matrix may be faster, depending on whether you include the time required to build the matrix.
import timeit
setup=\
'''
import numpy as np
dim=1000
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A
'''
print('mult '+str(timeit.timeit('A += B * D', setup, number=3)))
print('index '+str(timeit.timeit('A[C] += D[C]', setup, number=3)))
setup3 = setup+\
'''
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
'''
print('ma ' + str(timeit.timeit('A += D', setup3, number=3)))
setup4 = setup+\
'''
from scipy import sparse
S = sparse.csr_matrix(C)
DS = S.multiply(D)
'''
print('sparse- '+str(timeit.timeit('A += DS', setup4, number=3)))
setup5 = setup+\
'''
from scipy import sparse
'''
print('sparse+ '+str(timeit.timeit('S = sparse.csr_matrix(C); DS = S.multiply(D); A += DS', setup4, number=3)))
setup6 = setup+\
'''
from scipy import sparse
class Sparsemat(sparse.coo_matrix):
def __iadd__(self, other):
self.data += other.data
return self
A = Sparsemat(sparse.rand(dim, dim, 0.5, 'coo')) # the array to modify
D = np.random.random((dim, dim)) # the array which will be used to modify A
anz = A.nonzero()
'''
stmt6=\
'''
DS = Sparsemat((D[anz[0],anz[1]], anz), shape=A.shape) # new graph based on random weights
A += DS
'''
print('sparse2 '+str(timeit.timeit(stmt6, setup6, number=3)))
Output:
mult 0.0248420299535
index 0.32025789431
ma 0.1067024434
sparse- 0.00996273276303
sparse+ 0.228869672266
sparse2 0.105496183846
Edit: You can use the code above (setup6
) to extend scipy.sparse.coo_matrix
. It keeps the sparse format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With