I am comfortable with streams, but I admit I am puzzled by this behavior:
import collection.immutable.Stream
object StreamForceTest extends App {
println("Computing fibs")
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
println("Adding " + x._1 + " and " + x._2);
x._1 + x._2
})
println("Taking first 5 elements")
val fibs5 = fibs.take(5)
println("Computing length of that prefix")
println("fibs5.length = " + fibs5.length)
}
with output
Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5
Why should take(5) not force the stream's values to be computed,
while length does do so? Offhand neither one needs to actually
look at the values, but I would have thought that take was more
likely to do it than length. Inspecting the source code on github,
we find these definitions for take (including an illuminating
comment):
override def take(n: Int): Stream[A] = (
// Note that the n == 1 condition appears redundant but is not.
// It prevents "tail" from being referenced (and its head being evaluated)
// when obtaining the last element of the result. Such are the challenges
// of working with a lazy-but-not-really sequence.
if (n <= 0 || isEmpty) Stream.empty
else if (n == 1) cons(head, Stream.empty)
else cons(head, tail take n-1)
)
and length:
override def length: Int = {
var len = 0
var left = this
while (!left.isEmpty) {
len += 1
left = left.tail
}
len
}
The definition of head and tail is obtained from the specific
subclass (Empty and Cons). (Of course Empty is an object, not a
class, and its definitions of head and tail just throw
exceptions.) There are subtleties, but they seem to concern making
sure that the tail of a Cons is evaluated lazily; the head
definition is straight out of lecture 0 on Scala constructors.
Note that length doesn't go near head, but it's the one that
does the forcing.
All this is part of a general puzzlement about how close Scala streams are to Haskell lists. I thought Haskell treated head and tail symmetrically (I'm not a serious Haskell hacker), and Scala forced head evaluation in more circumstances. I'm trying to figure out exactly what those circumstances are.
Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice the context in which the take method refers to tail:
cons(head, tail take n-1)
Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.
Whereas in length, the statement
left = left.tail
, by assigning left.tail to a var, does force its evaluation.
Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.
Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.
If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.
On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With