Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write non-leaking tail-recursive function using Stream.cons in Scala?

When writing a function operating on Stream(s), there are different notions of recursion. The first simple sense is not recursive on the compiler level, since the tail if not evaluated instantly so the function returns immediately, but the returned stream is recursive:

final def simpleRec[A](as: Stream[A]): Stream[B] = 
  if (a.isEmpty) Stream.empty              
  else someB(a.head) #:: simpleRec(a.tail) 

The above notion of recursion doesn't cause any problems. The second one is truly tail-recursive on the compiler level:

@tailrec
final def rec[A](as: Stream[A]): Stream[B] = 
  if (a.isEmpty) Stream.empty              // A) degenerated
  else if (someCond) rec(a.tail)           // B) tail recursion
  else someB(a.head) #:: rec(a.tail)       // C) degenerated

The problem here is that the C) case is detected by the compiler as a non-tailrec call, even if there is no actual call carried out. This can be avoided by factoring out the stream tail into a helper function:

@tailrec
final def rec[A](as: Stream[A]): Stream[B] = 
  if (a.isEmpty) Stream.empty              
  else if (someCond) rec(a.tail)          // B)
  else someB(a.head) #:: recHelp(a.tail)  

@tailrec
final def recHelp[A](as: Stream[A]): Stream[B] = 
  rec(as)

While it compiles, this approach eventually results in a memory leak. Since the tail-recursive rec is eventually called from the recHelp function, the stack frame of the recHelp function holds a reference to the steam head, and doesn't let the stream to be garbage collected until the rec call returns, which can quite long (in terms of recursion steps) depending on the number of calls to B).

Note that even in the helperless case, if the compiler allowed the @tailrec, the memory leak may still be present since the lazy stream tail would in effect create an anonymous object holding reference to the stream head.

like image 625
ron Avatar asked Sep 21 '12 11:09

ron


2 Answers

A possible workaround is to make the recHelp method not hold reference to the stream head. This can be achieved by passing a wrapped stream to it, and mutating the wrapper to erase the reference from it:

@tailrec
final def rec[A](as: Stream[A]): Stream[B] = 
  if (a.isEmpty) Stream.empty              
  else if (someCond) rec(a.tail)          
  else {
    // don't inline and don't define as def,
    // or anonymous lazy wrapper object would hold reference
    val tailRef = new AtomicReference(a.tail)
    someB(a.head) #:: recHelp(tailRef)  
  }

@tailrec
final def recHelp[A](asRef: AtomicReference[Stream[A]]): Stream[B] = 
  // Note: don't put the content of the holder into a local variable
  rec(asRef.getAndSet(null))

The AtomicReference is just convenience, atomicity is not required in this case, any simple holder object would do.

Also note that since recHelp is wrapped in a stream Cons tail, therefore it will only be evaluated once, and Cons also takes care of synchronization.

like image 121
ron Avatar answered Nov 16 '22 01:11

ron


The problem, as you've hinted, is that in the code you pasted the filterHelp function keeps the head (hence your solution removing it).

Best answer is to simply avoid this surprising behaviour, use Scalaz EphemeralStream and see it both not oom and run significantly faster as its far nicer to the gc. Its not always as simple to work with e.g. head is a () => A not A, no extractors etc, but its all geared to one objective, reliable stream usage.

Your filterHelper function generally doesn't have to care about if it keeps a reference around:

import scalaz.EphemeralStream

@scala.annotation.tailrec
def filter[A](s: EphemeralStream[A], f: A => Boolean): EphemeralStream[A] = 
  if (s.isEmpty) 
    s
  else
    if (f(s.head())) 
      EphemeralStream.cons(s.head(), filterHelp(s.tail() , f) )
    else
      filter(s.tail(), f)

def filterHelp[A](s: EphemeralStream[A], f: A => Boolean) =
  filter(s, f)

def s1 = EphemeralStream.range(1, big)

I'd go so far as to say that unless you have a compelling reason to use Stream (other library dependencies etc) then just stick to EphemeralStream, there are far less surprises there.

like image 45
Chris Avatar answered Nov 16 '22 03:11

Chris