I am working with 2 dimensional Breeze matrices in Scala. At some point I have to do element-wise division of two matrices. Some elements in the denominator matrix can be zero, resulting into NaNs in the result.
I can loop through the matrix dimensions and replace the 0.0s with something >0.
But is there a simpler or Scala idiomatic solution for this?
Step-by-step:
With example matrix:
val dm = DenseMatrix((1.0, 0.0, 3.0), (0.0, 5.0, 6.0))
Find out which elements are equal to 0.0:
dm :== 0.0
breeze.linalg.DenseMatrix[Boolean] =
false true false
true false false
Slice the matrix:
dm(dm :== 0.0)
breeze.linalg.SliceVector[(Int, Int),Double] = breeze.linalg.SliceVector@2b
Use sliced matrix for replacement:
dm(dm :== 0.0) := 42.0
breeze.linalg.Vector[Double] = breeze.linalg.SliceVector@2b
Check the matrix:
dm
breeze.linalg.DenseMatrix[Double] =
1.0 42.0 3.0
42.0 5.0 6.0
Mapping out the NaN
is faster than slicing.
val matr = DenseMatrix((1.0, 0.0, 3.0), (0.0, 11.0, 12.0),
(1.0, 2.0, 0.0))
val matr2 = DenseMatrix((3.0, 0.0, 1.0), (0.0, 12.0, 11.0),
(2.0, 1.0, 0.0))
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
def replaceZeroes1(mat1: DenseMatrix[Double], mat2: DenseMatrix[Double], rep: Double) = {
(mat1 /:/ mat2).map(x => if (x.isNaN()) rep else x)
}
def replaceZeroes2(mat1: DenseMatrix[Double], mat2: DenseMatrix[Double], rep: Double) = {
mat1(mat1 :== 0.0) := rep
mat2(mat2 :== 0.0) := 1
mat1 /:/ mat2
}
time(println(replaceZeroes1(matr, matr2, 42.0)))
time(println(replaceZeroes2(matr, matr2, 42.0)))
Produces:
0.3333333333333333 42.0 3.0
42.0 0.9166666666666666 1.0909090909090908
0.5 2.0 42.0
Elapsed time: 13087782ns
Replace Zero2
0.3333333333333333 42.0 3.0
42.0 0.9166666666666666 1.0909090909090908
0.5 2.0 42.0
Elapsed time: 16613179ns
Mapping out the NaN is both quicker and more straight forward. It is faster even if you remove the second slice from function2.
NOTE: This was not tested in Spark with very large datasets, just breeze. In that case it's possible that there are different times (although I doubt it).
BONUS:
If you are simply trying to produce a 1s and 0s matrix from a matrix with any set of values, (such as producing a non-weighted network from a weighted network) I would just use:
(mat /:/ mat).map(x => if (x.isNaN()) 0.0 else x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With