Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace elements of a breeze matrix in Scala based on some condition?

I am working with 2 dimensional Breeze matrices in Scala. At some point I have to do element-wise division of two matrices. Some elements in the denominator matrix can be zero, resulting into NaNs in the result.

I can loop through the matrix dimensions and replace the 0.0s with something >0.

But is there a simpler or Scala idiomatic solution for this?

like image 588
inferno Avatar asked Dec 24 '22 19:12

inferno


2 Answers

Step-by-step:

  • With example matrix:

    val dm = DenseMatrix((1.0, 0.0, 3.0), (0.0, 5.0, 6.0))
    
  • Find out which elements are equal to 0.0:

    dm :== 0.0
    
    breeze.linalg.DenseMatrix[Boolean] =
    false  true   false
    true   false  false
    
  • Slice the matrix:

    dm(dm :== 0.0)
    
    breeze.linalg.SliceVector[(Int, Int),Double] = breeze.linalg.SliceVector@2b
    
  • Use sliced matrix for replacement:

    dm(dm :== 0.0) := 42.0
    
    breeze.linalg.Vector[Double] = breeze.linalg.SliceVector@2b
    
  • Check the matrix:

    dm
    
    breeze.linalg.DenseMatrix[Double] =
    1.0   42.0  3.0
    42.0  5.0   6.0
    
like image 197
zero323 Avatar answered May 24 '23 13:05

zero323


Mapping out the NaN is faster than slicing.

val matr = DenseMatrix((1.0, 0.0, 3.0), (0.0, 11.0, 12.0),
      (1.0, 2.0, 0.0))
val matr2 = DenseMatrix((3.0, 0.0, 1.0), (0.0, 12.0, 11.0),
      (2.0, 1.0, 0.0))

def time[R](block: => R): R = {
  val t0 = System.nanoTime()
  val result = block    // call-by-name
  val t1 = System.nanoTime()
  println("Elapsed time: " + (t1 - t0) + "ns")
  result
}

def replaceZeroes1(mat1: DenseMatrix[Double], mat2: DenseMatrix[Double], rep: Double) = {
   (mat1 /:/ mat2).map(x => if (x.isNaN()) rep else x)
}
    
def replaceZeroes2(mat1: DenseMatrix[Double], mat2: DenseMatrix[Double], rep: Double) = {
    mat1(mat1 :== 0.0) := rep
    mat2(mat2 :== 0.0) := 1
    mat1 /:/ mat2
}
time(println(replaceZeroes1(matr, matr2, 42.0)))
time(println(replaceZeroes2(matr, matr2, 42.0)))

Produces:

0.3333333333333333  42.0                3.0                 
42.0                0.9166666666666666  1.0909090909090908  
0.5                 2.0                 42.0                
Elapsed time: 13087782ns
Replace Zero2
0.3333333333333333  42.0                3.0                 
42.0                0.9166666666666666  1.0909090909090908  
0.5                 2.0                 42.0                
Elapsed time: 16613179ns

Mapping out the NaN is both quicker and more straight forward. It is faster even if you remove the second slice from function2.

NOTE: This was not tested in Spark with very large datasets, just breeze. In that case it's possible that there are different times (although I doubt it).

BONUS:

If you are simply trying to produce a 1s and 0s matrix from a matrix with any set of values, (such as producing a non-weighted network from a weighted network) I would just use:

(mat /:/ mat).map(x => if (x.isNaN()) 0.0 else x)
like image 38
Ryan Deschamps Avatar answered May 24 '23 11:05

Ryan Deschamps