I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size <code>100000X40000</code> What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1. Let's say I have row indices <code>[2, 236, 246, 389, 1691]</code> and column indices <code>[117, 3, 34, 2757, 74, 1635, 52]</code> so all the following entries must be incremented by one: <code>(2,117) (2,3) (2,34) (2,2757) ...</code> <code>(236,117) (236,3) (236, 34) (236,2757) ...</code> and so on. I'm already using <code>lil_matrix</code> as it gave me a warning to use while I tried to update a single entry. <code>lil_matrix</code> format is already not supporting multiple updating. <code>matrix[1:3,0] += [2,3]</code> is giving me a notimplemented error. I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use. My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

Creating a second matrix with <code>1</code>s in your new coordinates and adding it to the existing one is a possible way of doing this: <pre class="prettyprint"><code>>>> import scipy.sparse as sps >>> shape = (1000, 2000) >>> rows, cols = 1000, 2000 >>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix >>> for j in xrange(100): # add 100 sets of 100 1's ... r = np.random.randint(rows, size=100) ... c = np.random.randint(cols, size=100) ... d = np.ones((100,)) ... sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols)) ... >>> sps_acc <1000x2000 sparse matrix of type '<type 'numpy.float64'>' with 9985 stored elements in Compressed Sparse Row format> </code></pre>

<pre class="prettyprint"><code>import scipy.sparse rows = [2, 236, 246, 389, 1691] cols = [117, 3, 34, 2757, 74, 1635, 52] prod = [(x, y) for x in rows for y in cols] # combinations r = [x for (x, y) in prod] # x_coordinate c = [y for (x, y) in prod] # y_coordinate data = [1] * len(r) m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000)) </code></pre> I think it works well and doesn't need loops. I am directly following the doc <pre class="prettyprint"><code><100000x40000 sparse matrix of type '<type 'numpy.int32'>' with 35 stored elements in COOrdinate format> </code></pre>

Building and updating a sparse matrix in python using scipy

I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

Let's say I have row indices [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

I'm already using lil_matrix as it gave me a warning to use while I tried to update a single entry.

lil_matrix format is already not supporting multiple updating. matrix[1:3,0] += [2,3] is giving me a notimplemented error.

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

What is the SciPy function which creates a sparse matrix?

Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.

Is sparse a SciPy?

SciPy has a module, scipy. sparse that provides functions to deal with sparse data. There are primarily two types of sparse matrices that we use: CSC - Compressed Sparse Column.

Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>

import scipy.sparse

rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

I think it works well and doesn't need loops. I am directly following the doc

<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
    with 35 stored elements in COOrdinate format>

Building and updating a sparse matrix in python using scipy

Tags:

syllogismos

People also ask

2 Answers

Jaime

Ray

Recent Activity

Donate For Us

Building and updating a sparse matrix in python using scipy

Tags:

syllogismos

People also ask

2 Answers

Jaime

Ray

Related questions

Recent Activity

Donate For Us