I'll ask this with a Scala example, but it may well be that this affects other languages which allow hybrid imperative and functional styles. Here's a short example (UPDATED, see below): <pre class="prettyprint"><code>def method: Iterator[Int] { // construct some large intermediate value val huge = (1 to 1000000).toList val small = List.fill(5)(scala.util.Random.nextInt) // accidentally use huge in a literal small.iterator filterNot ( huge contains _ ) } </code></pre> Now <code>iterator.filterNot</code> works lazily, which is great! As a result, we'd expect that the returned iterator won't consume much memory (indeed, O(1)). Sadly, however, we've made a terrible mistake: since <code>filterNot</code> is lazy, it keeps a reference to the function literal <code>huge contains _</code>. Thus while we thought that the method would require a large amount of memory while it was running, and that that memory could be freed up immediately after the termination of the method, in fact that memory is stuck until we forget the returned <code>Iterator</code>. (I just made such a mistake, which took a long time to track down! You can catch such things looking at heap dumps ...) <blockquote> What are best practices for avoiding this problem? </blockquote> It seems that the only solution is to carefully check for function literals which survive the end of the scope, and which captured intermediate variables. This is a bit awkward if you're constructing a non-strict collection and planning on returning it. Can anyone think of some nice tricks, Scala-specific or otherwise, that avoid this problem and let me write nice code? UPDATE: the example I'd given previously was stupid, as huynhjl's answer below demonstrates. It had been: <pre class="prettyprint"><code>def method: Iterator[Int] { val huge = (1 to 1000000).toList // construct some large intermediate value val n = huge.last // do some calculation based on it (1 to n).iterator map (_ + 1) // return some small value } </code></pre> In fact, now that I understand a bit better how these things work, I'm not so worried!

Are you sure you're not oversimplifying the test case? Here is what I run: <pre class="prettyprint"><code>object Clos { def method: Iterator[Int] = { val huge = (1 to 2000000).toList val n = huge.last (1 to n).iterator map (_ + 1) } def gc() { println("GC!!"); Runtime.getRuntime.gc } def main(args:Array[String]) { val list = List(method, method, method) list.foreach(m => println(m.next)) gc() list.foreach(m => println(m.next)) list.foreach(m => println(m.next)) } } </code></pre> If I understand you correctly, because <code>main</code> is using the iterators even after the <code>gc()</code> call, the JVM would be holding onto the <code>huge</code> objects. This is how I run it: <pre class="prettyprint"><code>JAVA_OPTS="-verbose:gc" scala -cp classes Clos </code></pre> This is what it prints towards the end: <pre class="prettyprint"><code>[Full GC 57077K->57077K(60916K), 0.3340941 secs] [Full GC 60852K->60851K(65088K), 0.3653304 secs] 2 2 2 GC!! [Full GC 62959K->247K(65088K), 0.0610994 secs] 3 3 3 4 4 4 </code></pre> So it looks to me as if the <code>huge</code> objects were reclaimed...

How should I avoid unintentionally capturing the local scope in function literals?

Tags:

scope

scala

lazy-evaluation

function-literal

I'll ask this with a Scala example, but it may well be that this affects other languages which allow hybrid imperative and functional styles.

Here's a short example (UPDATED, see below):

def method: Iterator[Int] {
    // construct some large intermediate value
    val huge = (1 to 1000000).toList        
    val small = List.fill(5)(scala.util.Random.nextInt)
    // accidentally use huge in a literal
    small.iterator filterNot ( huge contains _ )    
}

Now iterator.filterNot works lazily, which is great! As a result, we'd expect that the returned iterator won't consume much memory (indeed, O(1)). Sadly, however, we've made a terrible mistake: since filterNot is lazy, it keeps a reference to the function literal huge contains _.

Thus while we thought that the method would require a large amount of memory while it was running, and that that memory could be freed up immediately after the termination of the method, in fact that memory is stuck until we forget the returned Iterator.

(I just made such a mistake, which took a long time to track down! You can catch such things looking at heap dumps ...)

What are best practices for avoiding this problem?

It seems that the only solution is to carefully check for function literals which survive the end of the scope, and which captured intermediate variables. This is a bit awkward if you're constructing a non-strict collection and planning on returning it. Can anyone think of some nice tricks, Scala-specific or otherwise, that avoid this problem and let me write nice code?

UPDATE: the example I'd given previously was stupid, as huynhjl's answer below demonstrates. It had been:

def method: Iterator[Int] {
    val huge = (1 to 1000000).toList // construct some large intermediate value
    val n = huge.last                // do some calculation based on it
    (1 to n).iterator map (_ + 1)    // return some small value 
}

In fact, now that I understand a bit better how these things work, I'm not so worried!

200

asked Oct 18 '10 04:10

Scott Morrison

1 Answers

Are you sure you're not oversimplifying the test case? Here is what I run:

object Clos {
  def method: Iterator[Int] = {
    val huge = (1 to 2000000).toList
    val n = huge.last
    (1 to n).iterator map (_ + 1)
  }

  def gc() { println("GC!!"); Runtime.getRuntime.gc }

  def main(args:Array[String]) {
    val list = List(method, method, method)
    list.foreach(m => println(m.next))
    gc()
    list.foreach(m => println(m.next))
    list.foreach(m => println(m.next))
  }
}

If I understand you correctly, because main is using the iterators even after the gc() call, the JVM would be holding onto the huge objects.

This is how I run it:

JAVA_OPTS="-verbose:gc" scala -cp classes Clos

This is what it prints towards the end:

[Full GC 57077K->57077K(60916K), 0.3340941 secs]
[Full GC 60852K->60851K(65088K), 0.3653304 secs]
2
2
2
GC!!
[Full GC 62959K->247K(65088K), 0.0610994 secs]
3
3
3
4
4
4

So it looks to me as if the huge objects were reclaimed...

answered Oct 11 '22 08:10

huynhjl

Related questions
                            
                                What is the difference between "container" and "provided" in SBT dependencies?
                            
                                How to use "cube" only for specific fields on Spark dataframe?
                            
                                Understanding type inferrence in Scala
                            
                                How do I set the scala sdk using gradle in Idea module?
                            
                                When is case syntactically significant?
                            
                                How to split comma separated string and get n values in Spark Scala dataframe?
                            
                                Scala incompatibility with Java 9 - java.lang.NoClassDefFoundError
                            
                                Is manually managing memory with .unpersist() a good idea?
                            
                                How to find out what implicit(s) are used in my scala code
                            
                                Why does running tests through jenkins user on build slave fail with Missing scala-library.jar?
                            
                                how would monadic rules apply if the function could be of a different type
                            
                                Does the synchronized construct in Java use internally (and somehow) the hardware primitive CAS operation?
                            
                                Scala Cats State Monad
                            
                                Write join query with groupby in Scala ActiveRecord
                            
                                How does the Scala compiler perform implicit conversion?
                            
                                DDD functional way: Why is it better to decouple state from the behavior when applying DDD with functional language?
                            
                                Can you get a class name as a constant for scala annotations?
                            
                                PySpark equivalent of function "typedLit" from Scala API
                            
                                Comparison of performance between Scala etc. and C/C++/Fortran?
                            
                                Referring to a Enumeration Value type in a method signature

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With