Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Julia, How can I column-normalize a sparse matrix?

If I have constructed a sparse matrix using the sparse(i, j, k) constructor, how can I then normalize the columns of the matrix (so that each column sums to 1)? I cannot efficiently normalize the entries before I create the matrix, so any help is appreciated. Thanks!

like image 268
user3587051 Avatar asked Dec 14 '22 21:12

user3587051


2 Answers

The easiest way would be a broadcasting division by the sum of the columns:

julia> A = sprand(4,5,.5)
       A./sum(A,1)
4x5 Array{Float64,2}:
 0.0        0.0989976  0.0        0.0       0.0795486
 0.420754   0.458653   0.0986313  0.0       0.0
 0.0785525  0.442349   0.0        0.856136  0.920451
 0.500693   0.0        0.901369   0.143864  0.0

… but it looks like that hasn't been optimized for sparse matrices yet, and falls back to a full matrix. So a simple loop to iterate over the columns does the trick:

julia> for (col,s) in enumerate(sum(A,1))
         s == 0 && continue # What does a "normalized" column with a sum of zero look like?
         A[:,col] = A[:,col]/s
       end
       A
4x5 sparse matrix with 12 Float64 entries:
    [2, 1]  =  0.420754
    [3, 1]  =  0.0785525
    [4, 1]  =  0.500693
    [1, 2]  =  0.0989976
    [2, 2]  =  0.458653
    [3, 2]  =  0.442349
    [2, 3]  =  0.0986313
    [4, 3]  =  0.901369
    [3, 4]  =  0.856136
    [4, 4]  =  0.143864
    [1, 5]  =  0.0795486
    [3, 5]  =  0.920451

julia> sum(A,1)
1x5 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0

This works entirely within sparse matrices and is done in-place (although it is still allocating new sparse matrices for each column slice).

like image 109
mbauman Avatar answered Dec 21 '22 09:12

mbauman


Given a Matrix A (does not matter whether or not it is sparse) normalize by any dimension

 A ./ sum(A,1) or  A ./ sum(A,2)

to show that it works:

A = sprand(10,10,0.3)   
println(sum(A,1))
println(A ./ sum(A,1))

only caveat

A[1,:] = 0
println(A ./ sum(A,1))

as you can see the column 1 now only contains NaNs because we divide by zero. Also we end up with a Matrix and not a sparse Matrix.

On the other hand one can quickly come up with an efficient specialized solution for your problem.

function normalize_columns(A :: SparseMatrixCSC)
          sums = sum(A,1)
          I,J,V = findnz(A)
          for idx in 1:length(V)
            V[idx] /= sums[J[idx]]
          end
          sparse(I,J,V)
end

@Matt B came up with a very similar answer while I was typing this up :)

like image 40
vchuravy Avatar answered Dec 21 '22 09:12

vchuravy