I have two arrays (that i have pulled out of a matrix (Array[Array[Int]]) and I need to subtract one from the other.
At the moment I am using this method however, when I profile it, it is the bottleneck.
def subRows(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
I need to do this billions of times so any improvement in speed is a plus.
I have tried using a List
instead of an Array
to collect the differences and it is MUCH faster but I lose all benefit when I convert it back to an Array
.
I did modify the downstream code to take a List
to see if that would help but I need to access the contents of the list out of order so again there is loss of any gains there.
It seems like any conversion of one type to another is expensive and I am wondering if there is some way to use a map etc. that might be faster.
Is there a better way?
Not sure what I did the first time!?
So the code I used to test it was this:
def subRowsArray(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
def subRowsList(a: Array[Int], b: Array[Int], sizeHint: Int): List[Int] = {
var l: List[Int] = Nil
var i = 0
while (i < sizeHint) {
l = a(i) - b(i) :: l
i += 1
}
l
}
val a = Array.fill(100, 100)(scala.util.Random.nextInt(2))
val loops = 30000 * 10000
def runArray = for (i <- 1 to loops) subRowsArray(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def runList = for (i <- 1 to loops) subRowsList(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def optTimer(f: => Unit) = {
val s = System.currentTimeMillis
f
System.currentTimeMillis - s
}
The results I thought I got the first time I did this were the exact opposite... I must have misread or mixed up the methods.
My apologies for asking a bad question.
That code is the fastest you can manage single-threaded using a standard JVM. If you think List
is faster, you're either fooling yourself or not actually telling us what you're doing. Putting an Int
into List
requires two object creations: one to create the list element, and one to box the integer. Object creations take about 10x longer than an array access. So it's really not a winning proposition to do it any other way.
If you really, really need to go faster, and must stay with a single thread, you should probably switch to C++ or the like and explicitly use SSE instructions. See this question, for example.
If you really, really need to go faster and can use multiple threads, then the easiest is to package up a chunk of work like this (i.e. a sensible number of pairs of vectors that need to be subtracted--probably at least a few million elements per chunk) into a list as long as the number of processors on your machine, and then call list.par.map(yourSubtractionRoutineThatActsOnTheChunkOfWork)
.
Finally, if you can be destructive,
a(i) -= b(i)
in the inner loop is, of course, faster. Likewise, if you can reuse space (e.g. with System.arraycopy
), you're better off than if you have to keep allocating it. But that changes the interface from what you've shown.
You can use Scalameter to try a benchmark the two implementations which requires at least JRE 7 update 4 and Scala 2.10 to be run. I used scala 2.10 RC2.
Compile with scalac -cp scalameter_2.10-0.2.jar RangeBenchmark.scala
.
Run with scala -cp scalameter_2.10-0.2.jar:. RangeBenchmark
.
Here's the code I used:
import org.scalameter.api._
object RangeBenchmark extends PerformanceTest.Microbenchmark {
val limit = 100
val a = new Array[Int](limit)
val b = new Array[Int](limit)
val array: Array[Int] = new Array(limit)
var list: List[Int] = Nil
val ranges = for {
size <- Gen.single("size")(limit)
} yield 0 until size
measure method "subRowsArray" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
array(i) = a(i) - b(i)
i += 1
}
r => array
}
}
measure method "subRowsList" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
list = a(i) - b(i) :: list
i += 1
}
r => list
}
}
}
Here's the results:
::Benchmark subRowsArray::
Parameters(size -> 100): 8.26E-4
::Benchmark subRowsList::
Parameters(size -> 100): 7.94E-4
You can draw your own conclusions. :)
The stack blew up on larger values of limit
. I'll guess it's because it's measuring the performance many times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With