Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Streams: how to avoid to keeping a reference to the head (and other elements)

Suppose I have a tail recursive method that loops over the elements of a Stream, like this (simplified code, not tested):

@tailrec
def loop(s: Stream[X], acc: Y): Y = 
  s.headOption match {
    case None => acc
    case Some(x) => loop(s.tail, accumulate(x, acc))
  }

Am I keeping references to the head (and all other elements) of the stream while iterating, which I know should be avoided? If so, then what is a better way to achieve the same thing?

The code that calls this is (I hope) not keeping a reference. Suppose list is a List[X] then the code is calling

loop(list.sliding(n).toStream, initialY)

EDIT: I know this can be done easily without tail recursion (e.g. using foldLeft) but the non-simplified code is not looping exactly one element at a time (sometimes s is used instead of s.tail and sometimes s.tail.dropWhile(...) is used. So I'm looking to find out how to correctly use Stream.

like image 250
herman Avatar asked Jan 15 '14 15:01

herman


2 Answers

tl;dr: your method loop is correct, there will be no reference to the head of the stream. You could test it on infinite Stream.

Let's simplify your code sample to the limit:

class Test {
  private[this] var next: Test = _

  final def fold(): Int = {
    next = new Test
    next.fold()
  }
}

Note that your loop method is also a method of some object.

The method is final (just like Stream#foldLeft) - it's very important.

With scalac -Xprint:all test.scala after tail recursion optimization you'll get this:

final def fold(): Int = {
  <synthetic> val _$this: Test = Test.this;
  _fold(_$this: Test){
    ({
      Test.this.next = new Test();
      _fold(Test.this.next)
    }: Int)
  }
};

And this code will not help you to understand what is going on.

The only road to the magical land of understanding is the java bytecode.

But you should remember one thing: there is no such thing as method of object. All methods are "static". And this is just the first parameter of the method. If the method is virtual there is such thing as vtable, but our method is final, so there will be no dynamic dispatch in this case.

Also note that there is no such thing as parameter: all parameters are just variables, initialized before method execution.

So this is just the first variable (index 0) of the method.

Let's take a look at the bytecode (javap -c Test.class):

public final int fold();
  Code:
     0: aload_0       
     1: new           #2                  // class Test
     4: dup           
     5: invokespecial #16                 // Method "<init>":()V
     8: putfield      #18                 // Field next:LTest;
    11: aload_0       
    12: getfield      #18                 // Field next:LTest;
    15: astore_0      
    16: goto          0

Let's write this method in pseudo scala-like code:

static foo(var this: Test): Int {
  :start // label for goto jump

  // place variable `this` onto the stack:
  //   0: aload_0       

  // create new `Test`
  //   1: new           #2                  // class Test
  // invoke `Test` constructor
  //   4: dup           
  //   5: invokespecial #16                 // Method "<init>":()V

  // assign `this.next` field value
  //   8: putfield      #18                 // Field next:LTest;

  this.next = new Test

  // place `this.next` onto the stack
  //  11: aload_0       
  //  12: getfield      #18                 // Field next:LTest;

  // assign `this.next` to variable `this`!
  //  15: astore_0      
  this = this.next // we have no reference to the previous `this`!

  //  16: goto          0
  goto :start
}

After this = this.next we have no reference to the previous this on stack or in the first variable. And the previous this can be removed by GC!

So tail.foldLeft(...) in Stream#foldLeft will be replaced with this = this.tail, ...; goto :start. And since this is just a firs argument of method @tailrec before foldLeft declaration makes sense.

And now we can finally understand scalac -Xprint:all test.scala result:

final def method(a: A, b: B, ...): Res = {
  <synthetic> val _$this: ThisType = ThisType.this;
  _method(_$this: Test, a: A, b: B, ...){
    ({
      // body
      _method(nextThis, nextA, nextB, ...)
    }: Res)
  }
};

means:

final def method(var this: ThisType, var a: A, var b: B, ...): Res = {
  // _method(_$this: Test, a: A, b: B, ...){
  :start

  // body

  //   _method(nextThis, nextA, nextB, ...)
  this = nextThis
  a = nextA
  b = nextB
  ...
  goto :start
};

And this is exactly what you'll get after scalac -Xprint:all on your loop method, but body will be huge. So in your case:

...
case Some(x) =>
  this = this
  s = s.tail
  acc = accumulate(x, acc)
  goto :start
...

After s = s.tail you have no reference to the head of the stream.

like image 178
senia Avatar answered Nov 01 '22 07:11

senia


In most cases where this question comes up, the more important question is "does the calling code hang onto the head of stream?"

What really matters is that you passed the output from another method directly to loop, rather than assigning it to a val first.

That said, I would just avoid all possible confusion by using a simpler approach:

list.sliding(n).foldLeft(initialY)(accumulate)
like image 40
Kevin Wright Avatar answered Nov 01 '22 08:11

Kevin Wright