I want to remove diagonal elements from a sparse matrix. Since the matrix is sparse, these elements shouldn't be stored once removed. Scipy provides a method to set diagonal elements values: setdiag If I try it using lil_matrix, it works: <pre class="prettyprint"><code>>>> a = np.ones((2,2)) >>> c = lil_matrix(a) >>> c.setdiag(0) >>> c <2x2 sparse matrix of type '<type 'numpy.float64'>' with 2 stored elements in LInked List format> </code></pre> However with csr_matrix, it seems diagonal elements are not removed from storage: <pre class="prettyprint"><code>>>> b = csr_matrix(a) >>> b <2x2 sparse matrix of type '<type 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> >>> b.setdiag(0) >>> b <2x2 sparse matrix of type '<type 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> >>> b.toarray() array([[ 0., 1.], [ 1., 0.]]) </code></pre> Through a dense array, we have of course: <pre class="prettyprint"><code>>>> csr_matrix(b.toarray()) <2x2 sparse matrix of type '<type 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format> </code></pre> Is that intended? If so, is it due to the compressed format of csr matrices? Is there any workaround else than going from sparse to dense to sparse again?

Simply setting elements to 0 does not change the sparsity of a <code>csr</code> matrix. You have to apply <code>eliminate_zeros</code>. <pre class="prettyprint"><code>In [807]: a=sparse.csr_matrix(np.ones((2,2))) In [808]: a Out[808]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> In [809]: a.setdiag(0) In [810]: a Out[810]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> In [811]: a.eliminate_zeros() In [812]: a Out[812]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format> </code></pre> Since changing the sparsity of a csr matrix is relatively expensive, they let you change values to 0 without changing sparsity. <pre class="prettyprint"><code>In [829]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000))) ...: a.setdiag(0) 100 loops, best of 3: 3.86 ms per loop In [830]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000))) ...: a.setdiag(0) ...: a.eliminate_zeros() SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. 10 loops, best of 3: 133 ms per loop In [831]: %%timeit a=sparse.lil_matrix(np.ones((1000,1000))) ...: a.setdiag(0) 100 loops, best of 3: 14.1 ms per loop </code></pre>

Removing diagonal elements from a sparse matrix in scipy

Q: Is diagonal matrix a sparse matrix?

Diagonals containing only zero are omitted. For sparse matrices with very few non-zero diagonals, such as diagonal or tridiagonal matrices, the DIA format allows for very quick arithmetic operations. Its main limitation is that looking up each matrix element requires performing a blind search through the offsets array.

Q: What does Scipy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

Q: What is diagonal sparse matrix?

The tridiagonal regular sparse matrix where all non-zero elements lie on one of the three diagonals, the main diagonal above and below. Storing Tri-diagonal regular sparse matrices. In a tri-diagonal regular sparse matrix, all the non-zero elements are stored in a 1-dimensional array row by row.

Tags:

python

scipy

sparse-matrix

I want to remove diagonal elements from a sparse matrix. Since the matrix is sparse, these elements shouldn't be stored once removed.

Scipy provides a method to set diagonal elements values: setdiag

If I try it using lil_matrix, it works:

>>> a = np.ones((2,2))
>>> c = lil_matrix(a)
>>> c.setdiag(0)
>>> c
<2x2 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in LInked List format>

However with csr_matrix, it seems diagonal elements are not removed from storage:

>>> b = csr_matrix(a)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>

>>> b.setdiag(0)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>

>>> b.toarray()
array([[ 0.,  1.],
       [ 1.,  0.]])

Through a dense array, we have of course:

>>> csr_matrix(b.toarray())
<2x2 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>

Is that intended? If so, is it due to the compressed format of csr matrices? Is there any workaround else than going from sparse to dense to sparse again?

471

asked Nov 23 '16 11:11

kevad

1 Answers

Simply setting elements to 0 does not change the sparsity of a csr matrix. You have to apply eliminate_zeros.

In [807]: a=sparse.csr_matrix(np.ones((2,2)))
In [808]: a
Out[808]: 
<2x2 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>
In [809]: a.setdiag(0)
In [810]: a
Out[810]: 
<2x2 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>
In [811]: a.eliminate_zeros()
In [812]: a
Out[812]: 
<2x2 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>

Since changing the sparsity of a csr matrix is relatively expensive, they let you change values to 0 without changing sparsity.

In [829]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
     ...: a.setdiag(0)
100 loops, best of 3: 3.86 ms per loop

In [830]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
     ...: a.setdiag(0)
     ...: a.eliminate_zeros()
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
10 loops, best of 3: 133 ms per loop

In [831]: %%timeit a=sparse.lil_matrix(np.ones((1000,1000)))
     ...: a.setdiag(0)
100 loops, best of 3: 14.1 ms per loop

158

answered Oct 06 '22 20:10

hpaulj

Related questions
                            
                                Python scikit learn multi-class multi-label performance metrics?
                            
                                Is there any function in python which can perform the inverse of numpy.repeat function?
                            
                                Failure to import matplotlib.pyplot in jupyter (but not ipython)
                            
                                Operation on numpy arrays contain rows with different size
                            
                                How to modify cells in a pandas DataFrame?
                            
                                How to use `Dirichlet Process Gaussian Mixture Model` in Scikit-learn? (n_components?)
                            
                                Matplotlib: How to increase colormap/linewidth quality in streamplot?
                            
                                Generating points on a circle
                            
                                What is the difference between Cerberus Custom Rules and Custom Validators?
                            
                                Compare columns of Pandas dataframe for equality to produce True/False, even NaNs
                            
                                Skipping falsifying examples in Hypothesis
                            
                                boto3 query using KeyConditionExpression
                            
                                Configure Sentry for different environments (staging, production)
                            
                                How can I solve y = (x+1)**3 -2 for x in sympy?
                            
                                Optional Synchronous Interface to Asynchronous Functions
                            
                                QScintilla based text editor in PyQt5 with clickable functions and variables
                            
                                How to find the nth derivative given the first derivative with SymPy?
                            
                                Correlating a SQLAlchemy relationship with an awkward join
                            
                                Post import hooks in Python 3
                            
                                Calculate maximum likelihood using PyMC3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With