Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala - Iterate Over Two Arrays

Tags:

scala

How do you iterate over two arrays of the same size, accessing the same index each iteration The Scala Way™?

      for ((aListItem, bListItem) <- (aList, bList)) {
         // do something with items
      }

The Java way applied to Scala:

     for(i <- 0 until aList.length ) {
          aList(i)
          bList(i)
      }

Assume both lists are the same size.

like image 201
BAR Avatar asked Feb 05 '15 03:02

BAR


1 Answers

tl;dr: There are trade-offs between speed and convenience; you need to know your use case to pick appropriately.


If you know both arrays are the same length and you don't need to worry how fast it is, the easiest and most canonical is to use zip inside a for-comprehension:

for ((a,b) <- aList zip bList) { ??? }

The zip method creates a new single array, however. To avoid that overhead you can use zipped on a tuple which will present the elements in pairs to methods like foreach and map:

(aList, bList).zipped.foreach{ (a,b) => ??? }

Faster still is to index into the arrays, especially if the arrays contain primitives like Int, since the generic code above has to box them. There is a handy method indices that you can use:

for (i <- aList.indices) { ??? }

Finally, if you need to go as fast as you possibly can, you can fall back to manual while loops or recursion, like so:

// While loop
var i = 0
while (i < aList.length) {
  ???
  i += 1
}

// Recursion
def loop(i: Int) {
  if (i < aList.length) {
    ???
    loop(i+1)
  }
}
loop(0)

If you are computing some value, rather than having it be a side effect, it's sometimes faster with recursion if you pass it along:

// Recursion with explicit result
def loop(i: Int, acc: Int = 0): Int =
  if (i < aList.length) {
    val nextAcc = ???
    loop(i+1, nextAcc)
  }
  else acc

Since you can drop a method definition in anywhere, you can use recursion without restriction. You can add an @annotation.tailrec annotation to make sure it can be compiled down to a fast loop with jumps instead of actual recursion that eats stack space.

Taking all these different approaches to calculate a dot product on length 1024 vectors, we can compare these to a reference implementation in Java:

public class DotProd {
  public static int dot(int[] a, int[] b) {
    int s = 0;
    for (int i = 0; i < a.length; i++) s += a[i]*b[i];
    return s;
  }
}

plus an equivalent version where we take the dot product of the lengths of strings (so we can assess objects vs. primitives)

normalized time
-----------------
primitive  object  method
---------  ------  ---------------------------------
 100%       100%   Java indexed for loop (reference)
 100%       100%   Scala while loop
 100%       100%   Scala recursion (either way)
 185%       135%   Scala for comprehension on indices
2100%       130%   Scala zipped
3700%       800%   Scala zip

This is particularly bad, of course, with primitives! (You get similarly huge jumps in time taken if you try to use ArrayLists of Integer instead of Array of int in Java.) Note in particular that zipped is quite a reasonable choice if you have objects stored.

Do beware of premature optimization, though! There are advantages to in clarity and safety to functional forms like zip. If you always write while loops because you think "every little bit helps", you're probably making a mistake because it takes more time to write and debug, and you could be using that time optimizing some more important part of your program.


But, assuming your arrays are the same length is dangerous. Are you sure? How much effort will you make to be sure? Maybe you shouldn't make that assumption?

If you don't need it to be fast, just correct, then you have to choose what to do if the two arrays are not the same length.

If you want to do something with all the elements up to the length of the shorter, then zip is still what you use:

// The second is just shorthand for the first
(aList zip bList).foreach{ case (a,b) => ??? }
for ((a,b) <- (aList zip bList)) { ??? }

// This avoids an intermediate array
(aList, bList).zipped.foreach{ (a,b) => ??? }

If you instead want to pad the shorter one with a default value, you would

aList.zipAll(bList, aDefault, bDefault).foreach{ case (a,b) => ??? }
for ((a,b) <- aList.zipAll(bList, aDefault, bDefault)) { ??? }

In any of these cases, you can use yield with for or map instead of foreach to generate a collection.

If you need the index for a calculation or it really is an array and you really need it to be fast, you will have to do the calculation manually. Padding missing elements is awkward (I leave that as an exercise to the reader), but the basic form would be:

for (i <- 0 until math.min(aList.length, bList.length)) { ??? }

where you then use i to index into aList and bList.

If you really need maximum speed you would again use (tail) recursion or while loops:

val n = math.min(aList.length, bList.length)
var i = 0
while (i < n) {
  ???
  i += 1
}

def loop(i: Int) {
  if (i < aList.length && i < bList.length) {
    ???
    loop(i+1)
  }
}
loop(0)
like image 88
Rex Kerr Avatar answered Sep 22 '22 08:09

Rex Kerr