Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does Preserve sharing means in lazy Streams?

I'm following the Functional Programming in Scala book. Here is the code snippet from Stream definition and functions to construct constant using smart constructor and using unfold:

sealed trait Stream[+A]
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, tl: () => Stream[A]) extends Stream[A]

object Stream {
  def cons[A](h: => A, tl: => Stream[A]): Stream[A] = {
    lazy val head = h
    lazy val tail = tl
    Cons(() => head, () => tail)
  }
  def empty[A]: Stream[A] = Empty

  def constant[A](a: A): Stream[A] = cons(a, constant(a))

  def unfold[A, S](z: S)(f: S => Option[(A, S)]): Stream[A] =
    f(z).fold(empty[A])(x => cons(x._1, unfold(x._2)(f)))

  def constantViaUnfold[A](a: A): Stream[A] = unfold(a)(x => Some((x, x)))
}

There is a footnote saying that:

Using unfold to define constant means that we won't get sharing as in the recursive definition. The recursive definition consumes constant memory even if we keep a reference to it around while traversing it, while the unfold-based implementation does not. Preserving sharing isn’t something we usually rely on when programming with streams, since it’s extremely delicate and not tracked by the types. For instance, sharing is destroyed when calling even xs.map(x => x).

What do the authors mean when saying that we are not getting sharing? What exactly is shared and why it's not preserved in unfold version? Also the sentence about constant memory consumption is not clear as well.

like image 877
Izbassar Tolegen Avatar asked Apr 30 '19 11:04

Izbassar Tolegen


2 Answers

Let's say you create a new list like that:

val n0 = Nil       //List()
val n1 = 1 :: n0   //List(1)
val n2 = 2 :: n1   //List(2,1)
val n3 = 3 :: n2   //List(3,2,1)

If you build a list like that, it's easy to notice that n3 is really n2 with 3 prepended and n2 is just n1 with 2 prepended, etc. So reference to 1 is shared by n1, n2 and n3 and reference to 2 is shared by n2 and n2, etc. You could write it also like:

Cons(3, Cons(2, Cons(1, Nil)))

This is also the case in an example from FPinS when you create Stream recursively. You Stream is built from nested Cons and every substream shares elements with its parent. So when the next Stream is created it just wraps old one inside Cons with the new element. But since Stream is lazy all this building of Cons hierarchy will be only done if you materialize it, for example by calling toList.

Building stream like that gives you also constant memory consumption because creating the next stream from previous will only cost memory for a new element.

And why it is not a case with unfold? Because it builds Stream "other way around". So it goes like:

Cons(x, next_iteration_of_unfold)                    //1st iteration
Cons(x, Cons(x, next_iteration_of_unfold))           //2nd iteration
Cons(x, Cons(x, Cons(x, next_iteration_of_unfold)))  //3rd iteration, etc.

So as you can see nothing can be shared.

EDIT:

You can see how the materialized stream looks by calling take on final implementation on the book and adding toString to Cons:

override def toString: String = s"Cons(${h()}, ${tl()})"

And then:

Stream.ones.take(3)

You will see:

Cons(1, Cons(1, Cons(1, Empty)))
like image 116
Krzysztof Atłasik Avatar answered Oct 26 '22 07:10

Krzysztof Atłasik


I will provide the original book's (2014 year edition) quote to avoid confusion :

Using unfold to define constant and ones means that we don’t get sharing as in the recursive definition val ones: Stream[Int] = cons(1, ones). The recursive definition consumes constant memory even if we keep a reference to it around while traversing it, while the unfold-based implementation does not. Preserving sharing isn’t something we usually rely on when programming with streams, since it’s extremely delicate and not tracked by the types. For instance, sharing is destroyed when calling even xs.map(x => x).

As you can see the problem is a bit different here now. It's about reusing Cons instance, so Stream just reference to itself here.

So it was supposed by the author to have implementation like this:

//Imp1
def constant[A](a: A): Stream[A] = {
   lazy val ones: Stream[A] = cons(a, ones)
   ones
}

instead of just

//Imp2
def constant[A](a: A): Stream[A] = cons(a, constant(a))

As you can see Imp1 creates only one instance of Cons for entire stream. Imp2 and unfold version from your example generate new Cons on every next call instead, so both methods generate streams that don't share Cons instance.

like image 32
Bogdan Vakulenko Avatar answered Oct 26 '22 07:10

Bogdan Vakulenko