Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating an O(1)-memory Iterable from an initial object and a function which generates the next object, in Scala

I want a convenient way to generate an Iterable, given a initial object and a function to produce the next object from the current one, that consumes O(1) memory (i.e., it doesn't cache old results; if you want to iterate a second time, the function has to be applied again).

It doesn't seem like there's library support for this. In Scala 2.8, the method scala.collection.Iterable.iterate has signature

def iterate [A] (start: A, len: Int)(f: (A) ⇒ A) : Iterable[A]

so it requires that you specify how many iterated function applications you're interested in ahead of time, and my understanding of the documentation is that Iterable.iterate actually computes all these values immediately. On the other hand, the method scala.collection.Iterator.iterate has signature

def iterate [T] (start: T)(f: (T) ⇒ T) : Iterator[T]

which looks great, but we only get an Iterator which doesn't offer all the convenience of map, filter and friends.

Is there a convenient library method to produce what I want?

and if not,

Can someone suggest the 'colloquial' Scala code for doing this?

To summarise, given an initial object a: A, and a function f: A => A, I'd like a TraversableLike (e.g., probably an Iterable) that generates a, f(a), f(f(a)), ..., and uses O(1) memory, with map, filter etc. functions that also return something that is O(1) in memory.

like image 505
Scott Morrison Avatar asked Sep 24 '10 03:09

Scott Morrison


2 Answers

Stream will do what you want, just don't hold on to the cells; only iterate over the values.

It is a sadly common misconception floating around that Streams inherently cache every value they compute.

If you write this:

val s1: Stream[Thing] = initialValue #:: «expression computing next value»

then indeed every value produced by the stream is retained, but this is not necessary. If you write:

def s2: Stream[Thing] = initialValue #:: «expression computing next value»

and if the caller just iterates over the stream's values but does not remember the Stream value itself (specifically any of its cons cells), then no unwanted retention will occur. Of course, in this formulation, every call creates a new Stream starting from a fixed initial value. That's not necessary:

def s3(start: Thing): Stream[Thing] = start #:: «expression computing next value»

The one thing you have to watch out for is passing a Stream to a method. Doing so will capture the head of the stream passed in the method parameter. One way around this is to process the stream with tail-recursive code.

like image 96
Randall Schulz Avatar answered Sep 21 '22 02:09

Randall Schulz


Iterator.iterate demo with filter:

object I {
  def main(args:Array[String]) {
    val mb = 1024 * 1024
    val gen = Iterator.iterate(new Array[Int](10 * mb)){arr => 
      val res = new Array[Int](10 * mb)
      arr.copyToArray(res)
      println("allocated 10mb")
      res(0) = arr(0) + 1 // store iteration count in first elem of new array
      res
    }
    // take 1 out of 100
    val gen2 = gen filter (arr => arr(0) % 100 == 0) 
    // print first 10 filtered
    gen2.take(10).foreach { arr => println("filtered " + arr(0)) } 
  }
}

(this may not work in the REPL as the PRINT step may mess up with memory management)

JAVA_OPTS="-Xmx128m" scala -cp classes I will show that the filtering works and is lazy. If it wasn't done in constant memory that would cause a heap error (since it's allocating something like 900*10mb).

Use JAVA_OPTS="-Xmx128m -verbose:gc" scala -cp classes I to see the garbage collection events.

like image 25
huynhjl Avatar answered Sep 18 '22 02:09

huynhjl