Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently construct a large SparseArray? Packages for this?

Tags:

Problem

Does Julia have an efficient way of constructing a huge sparse matrix from a given list of entries (u,v,w), some of which can have the same locations (u,v), and in that case, their weights w must be summed. Thus u,v,w are input vectors and I wish to create a sparse matrix that has value w[i] at position u[i],v[i]. For instance, the Mathematica code

n=10^6;   m=500*n;   
u=RandomInteger[{1,n},m];   
v=RandomInteger[{1,n},m];   
w=RandomInteger[{-9, 9}, m];   AbsoluteTiming[
SetSystemOptions["SparseArrayOptions"->{"TreatRepeatedEntries"->1}]; 
a= SparseArray[{u,v}\[Transpose] -> w, {n,n}]; ]

requires 135sec and 60GB of RAM. The equivalent Python code

import scipy.sparse as sp   
import numpy as np
import time
def ti(): return time.perf_counter() 
n=10**6; m=500*n;
u=np.random.randint(0,n,size=m);            
v=np.random.randint(0,n,size=m);            
w=np.random.randint(-9,10,size=m); t0=ti(); 
a=sp.csr_matrix((w,(u,v)),shape=(n,n),dtype=int); t1=ti(); print(t1-t0)

needs 36sec and 20GB, but doesn't support (2). The equivalent Julia code

using SparseArrays;
m=n=10^6; r=500*n;   
u=rand(1:m,r);   
v=rand(1:n,r);   
w=rand(-9:9,r);
function f() a=sparse(u,v,w,m,n,+); end;   
@time f()

needs 155sec and 26GB (the @time macro incorrectly reports that it used only 15GB).

Is there a way to make this construction more efficient? Are there any Julia packages that are better at this?

Background

I have created a package for linear algebra and homological algebra computations. I did it in Mathematica, since the SparseArray implementation (0) there is

  1. very efficient (fast and low RAM usage),
  2. supports exact fractions as matrix entries.
  3. supports polynomials as matrix entries.

However, it is not

  1. parallelized
  2. open source
  3. GPU or cluster enabled.

For various long-term reasons, I am considering migrating to a different programming language.

I have tried Python SciPy.sparse, but it doesn't satisfy (2). I'm not sure if C++ libraries support (2). As a test, I tried sorting an array of floats of size 10^9 and compared the performance of Mathematica, Python numpy, and Julia ThreadsX. I was incredibly impressed by the latter (3x faster than numpy, 10x than Mathematica). I am considering migrating to Julia, but I wish to first make sure that my library would perform better than in Mathematica.

P.S. How can I achieve that my f() above doesn't print/output anything when called?

P.P.S. I also asked this question here.

like image 215
Leo Avatar asked Jul 17 '21 19:07

Leo


1 Answers

In this particular case, it looks like you may be able to get what you are after more simply with (e.g.)

julia> @time a = sprand(Float64, 10^6, 10^6, 500/10^6)
 17.880987 seconds (7 allocations: 7.459 GiB, 7.71% gc time)
1000000×1000000 SparseMatrixCSC{Float64, Int64} with 499998416 stored entries:
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛⠛

More generally, you may want to check out the underlying method SparseArrays.sparse!, which will allow more efficient in-place construction.

help?> SparseArrays.sparse!
  sparse!(I::AbstractVector{Ti}, J::AbstractVector{Ti}, V::AbstractVector{Tv},
          m::Integer, n::Integer, combine, klasttouch::Vector{Ti},
          csrrowptr::Vector{Ti}, csrcolval::Vector{Ti}, csrnzval::Vector{Tv},
          [csccolptr::Vector{Ti}], [cscrowval::Vector{Ti}, cscnzval::Vector{Tv}] ) where {Tv,Ti<:Integer}

  Parent of and expert driver for sparse; see sparse for basic usage. This method allows
  the user to provide preallocated storage for sparse's intermediate objects and result
  as described below. This capability enables more efficient successive construction of
  SparseMatrixCSCs from coordinate representations, and also enables extraction of an
  unsorted-column representation of the result's transpose at no additional cost.

  This method consists of three major steps: (1) Counting-sort the provided coordinate
  representation into an unsorted-row CSR form including repeated entries. (2) Sweep
  through the CSR form, simultaneously calculating the desired CSC form's column-pointer
  array, detecting repeated entries, and repacking the CSR form with repeated entries
  combined; this stage yields an unsorted-row CSR form with no repeated entries. (3)
  Counting-sort the preceding CSR form into a fully-sorted CSC form with no repeated
  entries.

  Input arrays csrrowptr, csrcolval, and csrnzval constitute storage for the
  intermediate CSR forms and require length(csrrowptr) >= m + 1, length(csrcolval) >=
  length(I), and length(csrnzval >= length(I)). Input array klasttouch, workspace for
  the second stage, requires length(klasttouch) >= n. Optional input arrays csccolptr,
  cscrowval, and cscnzval constitute storage for the returned CSC form S. csccolptr
  requires length(csccolptr) >= n + 1. If necessary, cscrowval and cscnzval are
  automatically resized to satisfy length(cscrowval) >= nnz(S) and length(cscnzval) >=
  nnz(S); hence, if nnz(S) is unknown at the outset, passing in empty vectors of the
  appropriate type (Vector{Ti}() and Vector{Tv}() respectively) suffices, or calling the
  sparse! method neglecting cscrowval and cscnzval.

  On return, csrrowptr, csrcolval, and csrnzval contain an unsorted-column
  representation of the result's transpose.

  You may reuse the input arrays' storage (I, J, V) for the output arrays (csccolptr,
  cscrowval, cscnzval). For example, you may call sparse!(I, J, V, csrrowptr, csrcolval,
  csrnzval, I, J, V).

  For the sake of efficiency, this method performs no argument checking beyond 1 <= I[k]
  <= m and 1 <= J[k] <= n. Use with care. Testing with --check-bounds=yes is wise.

  This method runs in O(m, n, length(I)) time. The HALFPERM algorithm described in F.
  Gustavson, "Two fast algorithms for sparse matrices: multiplication and permuted
  transposition," ACM TOMS 4(3), 250-269 (1978) inspired this method's use of a pair of
  counting sorts.

If you need something even faster than this, in principle you can construct a sparse matrix with virtually no overhead by using the SparseMatrixCSC constructor directly (though you have to know what you're doing and what the fields mean). As some of the early contributors to what is now the SparseArrays stdlib (at the time part of Base) noted

I consider using the SparseMatrixCSC constructor directly to be the I-know-what-I'm-doing interface. People use it to skip the overhead and checking of sparse, for example to create matrices with non-sorted rows, or explicit zeros, things like that. I'd be fine with making show do some validity checks on the assumptions it makes before trying to show the matrix, but having a lenient constructor as a direct way of saying "trust me, wrap this data in a SparseMatrixCSC" is a valuable thing.

If you want, you could certainly write parallelized functions (using ThreadsX.jl or the threaded version of LoopVectorization.jl or whatever else) to construct the underlying arrays directly in parallel, then wrap them with SparseMatrixCSC

For all the details on how to construct a SparseMatrixCSC directly, you might check out https://github.com/JuliaLang/julia/blob/master/stdlib/SparseArrays/src/sparsematrix.jl

Edit: to give an illustrative example (though see warnings below)

using SparseArrays
m = n = 10^6
colptr = collect(1:500:n*500+1)
rowval = repeat(1:2000:m, n)
nzval = rand(Float64, n*500)
@time a = SparseMatrixCSC(m,n,colptr,rowval,nzval)
julia> @time a = SparseMatrixCSC(m,n,colptr,rowval,nzval)
  0.012364 seconds (1 allocation: 48 bytes)
1000000×1000000 SparseMatrixCSC

{Float64, Int64} with 500000000 stored entries:
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿

julia> a[1,1]
0.9356478932120766

julia> a[2,1]
0.0

julia> a[2001,1]
0.2121877970335444

julia> a[:,1]
1000000-element SparseVector{Float64, Int64} with 500 stored entries:
  [1      ]  =  0.935648
  [2001   ]  =  0.212188
  [4001   ]  =  0.429638
  [6001   ]  =  0.0190535
  [8001   ]  =  0.0878085
  [10001  ]  =  0.24005
  [12001  ]  =  0.785151
  [14001  ]  =  0.348142
             ⋮
  [982001 ]  =  0.637904
  [984001 ]  =  0.136397
  [986001 ]  =  0.483078
  [988001 ]  =  0.862434
  [990001 ]  =  0.703863
  [992001 ]  =  0.00990588
  [994001 ]  =  0.629455
  [996001 ]  =  0.123507
  [998001 ]  =  0.411807

Two major caveats:

  1. There is no checking that rows are sorted in this example. If you don't know what that means or why that matters, this sort of low-level approach is not for you.
  2. This provides no checking against collisions. If you already know from the source of your data that there are no collisions, that is fine. For random matrices, it may or may not be fine statistically depending on level of sparsity. But again, if you don't know that that means, stay away.
like image 123
cbk Avatar answered Dec 06 '22 00:12

cbk