These two expressions should mean the same thing: <pre class="prettyprint"><code>Stream.from(1).filter(_ < 0).head Stream.from(1).find(_ < 0) </code></pre> The should loop around until they return <code>Int.MinValue</code>. And that is exactly what the version with <code>filter</code> does, but with <code>find</code> an <code>OutOfMemoryError</code> is produced. Looking at their implementations though, I can't figure out both versions don't produce an <code>OutOfMemoryError</code>. Here is the implementation of <code>Stream.filter</code>: <pre class="prettyprint"><code>override def filter(p: A => Boolean): Stream[A] = { // optimization: drop leading prefix of elems for which f returns false // var rest = this dropWhile (!p(_)) - forget DRY principle - GC can't collect otherwise var rest = this while (!rest.isEmpty && !p(rest.head)) rest = rest.tail // private utility func to avoid `this` on stack (would be needed for the lazy arg) if (rest.nonEmpty) Stream.filteredTail(rest, p) else Stream.Empty } </code></pre> <code>find</code> is inherited from <code>LinearSeqOptimized</code>, with this definition: <pre class="prettyprint"><code>override /*IterableLike*/ def find(p: A => Boolean): Option[A] = { var these = this while (!these.isEmpty) { if (p(these.head)) return Some(these.head) these = these.tail } None } </code></pre> They both have a while loop that discards elements of the <code>Stream</code> that don't satisfy the predicate. Because <code>this</code> should maintain a reference to the beginning of the <code>Stream</code> all of these created elements should accumulate in memory until we run out of space. Unless I am really misunderstanding what is happening here, <code>Stream.filter</code> is somehow eliminating <code>this</code> from its stack frame before it enters the while loop. The comment in <code>Stream.filter</code> on why <code>dropWhile</code> isn't used looks like a hint, but I have no idea what it is referring to. My next step would be learning how to disassemble and read JVM bytecode, but I'm really hoping someone knows what is happening here.

It's a combination of HotSpot and the way Scala's traits are implemented. If I turn HotSpot off with <code>-Xint</code>, <code>Stream.filter</code> will also die with an <code>OutOfMemoryException</code>. In the generated bytecode itself, <code>this</code> and the variables <code>rest</code> and <code>these</code> are stored in different memory locations, but because <code>this</code> is only used to initialize these variables I believe HotSpot is smart enough to simply reuse the memory location for <code>this</code>. This explains why <code>Stream.filter</code> does not run out of memory. HotSpot's optimization to <code>Stream.filter</code> should also apply to <code>LinearSeqOptimized.find</code>, however because of the way traits are implemented a reference to <code>this</code> is preserved. When a method is implemented inside of a trait, Scala compiles that method into a static method. When a class inherits from that trait, Scala creates a small stub method that invokes the static method. So even though HotSpot optimizes the static method for <code>LinearSeqOptimized.find</code> the stub method's stack frame still has a reference to <code>this</code>.

Why doesn't Stream.filter run out of memory?

Tags:

scala

scala-collections

These two expressions should mean the same thing:

Stream.from(1).filter(_ < 0).head
Stream.from(1).find(_ < 0)

The should loop around until they return Int.MinValue. And that is exactly what the version with filter does, but with find an OutOfMemoryError is produced. Looking at their implementations though, I can't figure out both versions don't produce an OutOfMemoryError.

Here is the implementation of Stream.filter:

override def filter(p: A => Boolean): Stream[A] = {
  // optimization: drop leading prefix of elems for which f returns false
  // var rest = this dropWhile (!p(_)) - forget DRY principle - GC can't collect otherwise
  var rest = this
  while (!rest.isEmpty && !p(rest.head)) rest = rest.tail
  // private utility func to avoid `this` on stack (would be needed for the lazy arg)
  if (rest.nonEmpty) Stream.filteredTail(rest, p)
  else Stream.Empty
}

find is inherited from LinearSeqOptimized, with this definition:

override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
  var these = this
  while (!these.isEmpty) {
    if (p(these.head)) return Some(these.head)
    these = these.tail
  }
  None
}

They both have a while loop that discards elements of the Stream that don't satisfy the predicate. Because this should maintain a reference to the beginning of the Stream all of these created elements should accumulate in memory until we run out of space. Unless I am really misunderstanding what is happening here, Stream.filter is somehow eliminating this from its stack frame before it enters the while loop. The comment in Stream.filter on why dropWhile isn't used looks like a hint, but I have no idea what it is referring to.

My next step would be learning how to disassemble and read JVM bytecode, but I'm really hoping someone knows what is happening here.

644

asked Apr 09 '14 15:04

wingedsubmariner

1 Answers

It's a combination of HotSpot and the way Scala's traits are implemented.

If I turn HotSpot off with -Xint, Stream.filter will also die with an OutOfMemoryException. In the generated bytecode itself, this and the variables rest and these are stored in different memory locations, but because this is only used to initialize these variables I believe HotSpot is smart enough to simply reuse the memory location for this. This explains why Stream.filter does not run out of memory.

HotSpot's optimization to Stream.filter should also apply to LinearSeqOptimized.find, however because of the way traits are implemented a reference to this is preserved. When a method is implemented inside of a trait, Scala compiles that method into a static method. When a class inherits from that trait, Scala creates a small stub method that invokes the static method. So even though HotSpot optimizes the static method for LinearSeqOptimized.find the stub method's stack frame still has a reference to this.

answered Sep 18 '22 23:09

wingedsubmariner

Related questions
                            
                                How do I run multiple functional specs with TestServer in Play 2.0.1?
                            
                                Play 2.0 - access running (Fake)Application from scala console
                            
                                library for integer factorization in java or scala
                            
                                process composition and exceptions
                            
                                How can I pass a type as a parameter in scala?
                            
                                Play 2 form limitation
                            
                                Sbt always does full rebuild because of modified binary dependency rt.jar
                            
                                Simple file read with Scala ARM library
                            
                                What does the "extends {..}" clause in Scala object definition, without superclass name, do?
                            
                                Self-titled field and accessor in Scala
                            
                                Generate Scala code from Antlr [closed]
                            
                                How does Scala implement return from within an expression?
                            
                                Best practices for creation of Akka microkernel init script
                            
                                How to query with '$in' over '_id' in reactive mongo and play
                            
                                Odd NullPointerException when reproducing a Java OpenGL ES 2.0 demo in Scala
                            
                                How to get ScalaTest correctly reporting tests results when using scalacheck with Propspec and PropertyCheck?
                            
                                Scala case class copy-method difference between 2.9 and 2.10
                            
                                How can I use OrientDB from a Scala / Play 2.2 project?
                            
                                How to create a caching layer on top of slick that could be applied globally?
                            
                                Call sourceGenerators manually in sbt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why doesn't Stream.filter run out of memory?

Tags:

scala

scala-collections

wingedsubmariner

People also ask

1 Answers

wingedsubmariner

Recent Activity

Donate For Us