Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building and updating a sparse matrix in python using scipy

Tags:

I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

Let's say I have row indices [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

I'm already using lil_matrix as it gave me a warning to use while I tried to update a single entry.

lil_matrix format is already not supporting multiple updating. matrix[1:3,0] += [2,3] is giving me a notimplemented error.

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

like image 752
syllogismos Avatar asked Dec 14 '13 12:12

syllogismos


People also ask

What is the SciPy function which creates a sparse matrix?

Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.

Is sparse a SciPy?

SciPy has a module, scipy. sparse that provides functions to deal with sparse data. There are primarily two types of sparse matrices that we use: CSC - Compressed Sparse Column.


2 Answers

Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>
like image 106
Jaime Avatar answered Oct 22 '22 19:10

Jaime


import scipy.sparse

rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

I think it works well and doesn't need loops. I am directly following the doc

<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
    with 35 stored elements in COOrdinate format>
like image 26
Ray Avatar answered Oct 22 '22 17:10

Ray