I'm trying to build and update a sparse matrix as I read data from file.
The matrix is of size 100000X40000
What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.
Let's say I have row indices [2, 236, 246, 389, 1691]
and column indices [117, 3, 34, 2757, 74, 1635, 52]
so all the following entries must be incremented by one:
(2,117) (2,3) (2,34) (2,2757) ...
(236,117) (236,3) (236, 34) (236,2757) ...
and so on.
I'm already using lil_matrix
as it gave me a warning to use while I tried to update a single entry.
lil_matrix
format is already not supporting multiple updating.
matrix[1:3,0] += [2,3]
is giving me a notimplemented error.
I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.
My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)
Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.
SciPy has a module, scipy. sparse that provides functions to deal with sparse data. There are primarily two types of sparse matrices that we use: CSC - Compressed Sparse Column.
Creating a second matrix with 1
s in your new coordinates and adding it to the existing one is a possible way of doing this:
>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
... r = np.random.randint(rows, size=100)
... c = np.random.randint(cols, size=100)
... d = np.ones((100,))
... sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
...
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
with 9985 stored elements in Compressed Sparse Row format>
import scipy.sparse
rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))
I think it works well and doesn't need loops. I am directly following the doc
<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
with 35 stored elements in COOrdinate format>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With