Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What library can I use for calc large sparse matrix in Java or Scala?

When I use large sparse matrix, it's better to use compressed matrix like CCS, CRS and so on.

I tried to use ScalaNLP, la4j, colc to calc 100,000*100,000 sparse matrix. There are some problems.

  1. Breeze (ScalaNLP/Scalala)

    • it gives me the CSCMatrix type which can have 100,000*100,000 size.
    • but the problem is it's under development.
    • so we cannot calc element-wise product of CSCMatrix with CSCMatrix, like csc1 :* csc2.
    • and also you cannot add CSCMatrixes to each other.
  2. la4j

    • It has CCSMatrix and CRSMatrix.
    • but when creating (new CCSMatrixFactory).createMatrix(100000, 100000), it occur OutOfMemoryError.
    • The matrix should be zeros, so it should not use large memory spaces.
  3. colc

    • It has SparseDoubleMatrix2D.
    • but when creating matrix like new SparseDoubleMatrix2d(100000, 100000), it says IllegalArgumentException: matrix too large.

To calc large sparse matrix, what library can I use? could you show me the example?

like image 276
krrrr38 Avatar asked Jun 15 '13 02:06

krrrr38


2 Answers

I was curious with Breeze, so I looked into the source. It's a bit messy because the operators are all emitted from some println style code generation (!)... But I came up with this:

import breeze.linalg.operators.{BinaryOp, OpMulScalar}

object CSCMatrixExtraOps {
  abstract class CSCMatrixCanMulM_M[@specialized (Int, Float, Long, Double) A]
    extends BinaryOp[CSCMatrix[A], CSCMatrix[A], OpMulScalar, CSCMatrix[A]] {

    protected def times(a: A, b: A): A

    protected def zeros  (rows: Int, cols: Int): CSCMatrix[A]
    protected def builder(rows: Int, cols: Int, sz: Int): CSCMatrix.Builder[A]

    final def apply(a: CSCMatrix[A], b: CSCMatrix[A]): CSCMatrix[A] = {
      val rows  = a.rows
      val cols  = a.cols
      require(rows == b.rows, "Matrices must have same number of rows!")
      require(cols == b.cols, "Matrices must have same number of cols!")

      if (cols == 0) return zeros(rows, cols)

 

      val res     = builder(rows, cols, math.min(a.activeSize, b.activeSize))
      var ci      = 0
      var acpStop = a.colPtrs(0)
      var bcpStop = b.colPtrs(0)
      while (ci < cols) {
        val ci1 = ci + 1
        var acp = acpStop
        var bcp = bcpStop
        acpStop = a.colPtrs(ci1)
        bcpStop = b.colPtrs(ci1)
        while (acp < acpStop && bcp < bcpStop) {
          val ari = a.rowIndices(acp)
          val bri = b.rowIndices(bcp)
          if (ari == bri) {
            val v = times(a.data(acp), b.data(bcp))
            res.add(ari, ci, v)
            acp += 1
            bcp += 1
          } else if (ari < bri) {
            acp += 1
          } else /* ari > bri */ {
            bcp += 1
          }
        }
        ci = ci1
      }

      res.result()
    }
  }

 

  implicit object CSCMatrixCanMulM_M_Int extends CSCMatrixCanMulM_M[Int] {
    protected def times(a: Int, b: Int) = a * b
    protected def zeros(rows: Int, cols: Int) = CSCMatrix.zeros(rows, cols)
    protected def builder(rows: Int, cols: Int, sz: Int) = 
      new CSCMatrix.Builder(rows, cols, sz)
  }

  implicit object CSCMatrixCanMulM_M_Double extends CSCMatrixCanMulM_M[Double] {
    protected def times(a: Double, b: Double) = a * b
    protected def zeros(rows: Int, cols: Int) = CSCMatrix.zeros(rows, cols)
    protected def builder(rows: Int, cols: Int, sz: Int) = 
      new CSCMatrix.Builder(rows, cols, sz)
  }
}

Example:

import breeze.linalg._
import CSCMatrixExtraOps._

val m1 = CSCMatrix((0, 0, 0), (0, 5, 0), (0, 0, 10), (0, 13, 0))
val m2 = CSCMatrix((0, 0, 0), (0, 5, 0), (0, 0, 10), (13, 0, 0))
(m1 :* m2).toDenseMatrix

Result:

0  0   0    
0  25  0    
0  0   100  
0  0   0    
like image 90
0__ Avatar answered Oct 17 '22 01:10

0__


I'm the author of la4j library. Let me give you few advices. So, when you created a new CRS/CCS matrix, la4j allocates only 32-lenght array for it (it is a default minimum size). Thus, it can not throw a OOM error (I've just checked it):

Matrix a = Matrices.CRS_FACTORY.createMatrix(100000, 100000);

But, it is better to use public constructor:

Matrix a = new CCSMatrix(100000, 100000);

Anyway, if you still getting this error, try to extend your heap size with -Xmx1024m -Xms512m.

And what do you mean by "The matrix should be zeros, so it should not use large memory spaces." I'm not sure that I understood it correctly.

BTW, use the last version of la4j: 0.4.0. Probably, issue you found was fixed by this pull-request.

like image 40
Vladimir Kostyukov Avatar answered Oct 16 '22 23:10

Vladimir Kostyukov