I'd like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one). <pre class="prettyprint"><code>from pylab import * import scipy.sparse as sp def normalize(W): z = W.sum(0) z[z < 1e-6] = 1e-6 return W / z[None,:] w = (rand(10,10)<0.1)*rand(10,10) w = sp.csr_matrix(w) w = normalize(w) </code></pre> However this gives the following exception: <pre class="prettyprint"><code>File "/usr/lib/python2.6/dist-packages/scipy/sparse/base.py", line 325, in __div__ return self.__truediv__(other) File "/usr/lib/python2.6/dist-packages/scipy/sparse/compressed.py", line 230, in __truediv__ raise NotImplementedError </code></pre> Are there any reasonably simple solutions? I have looked at this, but am still unclear on how to actually do the division.

This has been implemented in scikit-learn sklearn.preprocessing.normalize. <pre class="prettyprint"><code>from sklearn.preprocessing import normalize w_normalized = normalize(w, norm='l1', axis=1) </code></pre> <code>axis=1</code> should normalize by rows, <code>axis=0</code> to normalize by column. Use the optional argument <code>copy=False</code> to modify the matrix in place.

Efficient way to normalize a Scipy Sparse Matrix

Tags:

python

numpy

scipy

sparse-matrix

I'd like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one).

from pylab import *
import scipy.sparse as sp

def normalize(W):
    z = W.sum(0)
    z[z < 1e-6] = 1e-6
    return W / z[None,:]

w = (rand(10,10)<0.1)*rand(10,10)
w = sp.csr_matrix(w)
w = normalize(w)

However this gives the following exception:

File "/usr/lib/python2.6/dist-packages/scipy/sparse/base.py", line 325, in __div__
     return self.__truediv__(other)
File "/usr/lib/python2.6/dist-packages/scipy/sparse/compressed.py", line 230, in  __truediv__
   raise NotImplementedError

Are there any reasonably simple solutions? I have looked at this, but am still unclear on how to actually do the division.

885

asked Sep 06 '12 17:09

sterne

3 Answers

This has been implemented in scikit-learn sklearn.preprocessing.normalize.

from sklearn.preprocessing import normalize w_normalized = normalize(w, norm='l1', axis=1)

axis=1 should normalize by rows, axis=0 to normalize by column. Use the optional argument copy=False to modify the matrix in place.

answered Oct 03 '22 00:10

Aaron McDaid

While Aarons answer is correct, I implemented a solution when I wanted to normalize with respect to the maximum of the absolute values, which sklearn is not offering. My method uses the nonzero entries and finds them in the csr_matrix.data array to replace values there quickly.

def normalize_sparse(csr_matrix):     nonzero_rows = csr_matrix.nonzero()[0]     for idx in np.unique(nonzero_rows):         data_idx = np.where(nonzero_rows==idx)[0]         abs_max = np.max(np.abs(csr_matrix.data[data_idx]))         if abs_max != 0:             csr_matrix.data[data_idx] = 1./abs_max * csr_matrix.data[data_idx]

In contrast to sunan's solution, this method does not require any casting of the matrix into dense format (which could raise memory problems) and no matrix multiplications either. I tested the method on a sparse matrix of shape (35'000, 486'000) and it took ~ 18 seconds.

answered Oct 03 '22 01:10

AlexConfused

here is my solution.

transpose A
calculate sum of each col
format diagonal matrix B with reciprocal of sum
A*B equals normalization

transpose C

import scipy.sparse as sp
import numpy as np
import math

minf = 0.0001

A = sp.lil_matrix((5,5))
b = np.arange(0,5)
A.setdiag(b[:-1], k=1)
A.setdiag(b)
print A.todense()
A = A.T
print A.todense()

sum_of_col = A.sum(0).tolist()
print sum_of_col
c = []
for i in sum_of_col:
    for j in i:
        if math.fabs(j)<minf:
            c.append(0)
        else:
            c.append(1/j)

print c

B = sp.lil_matrix((5,5))
B.setdiag(c)
print B.todense()

C = A*B
print C.todense()
C = C.T
print C.todense()

answered Oct 03 '22 01:10

sunan

Related questions
                            
                                Is this the right way to run a shell script inside Python?
                            
                                Cannot concatenate 'str' and 'float' objects?
                            
                                In python, how to capture the stdout from a c++ shared library to a variable
                            
                                Rotate theta=0 on matplotlib polar plot
                            
                                Can I manually trigger signals in Django?
                            
                                Display SVG in IPython notebook from a function
                            
                                Matplotlib control which plot is on top
                            
                                How to quit ipdb while in post-mortem debugging?
                            
                                Django and domain driven design
                            
                                Datetime conversion - How to extract the inferred format?
                            
                                ImportError: No module named 'flask.ext' [duplicate]
                            
                                Writing unit tests in Django / Python
                            
                                How do convert unicode escape sequences to unicode characters in a python string
                            
                                Online IDE for Python [closed]
                            
                                Sending custom PyQt signals?
                            
                                Running Scapy on Windows with Python 2.7
                            
                                how to uniqify a list of dict in python
                            
                                Django IntegerField with Choice Options (how to create 0-10 integer options)
                            
                                Change value of currently iterated element in list
                            
                                Writing a Python list into a single CSV column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With