Scala: collecting updates/changes of immutable state

Tags:

I'm currently trying to apply a more functional programming style to a project involving low-level (LWJGL-based) GUI development. Obviously, in such a case it is necessary to carry around a lot of state, which is mutable in the current version. My goal is to eventually have a completely immutable state, in order to avoid state changes as side effect. I studied scalaz's lenses and state monads for awhile, but my main concern remains: All these techniques rely on copy-on-write. Since my state has both a large number of fields and also some fields of considerable size, I'm worried about performance.

To my knowledge the most common approach to modify immutable objects is to use the generated copy method of a case class (this is also what lenses do under the hood). My first question is, how this copy method is actually implemented? I performed a few experiments with a class like:

case class State(
  innocentField: Int, 
  largeMap: Map[Int, Int], 
  largeArray: Array[Int]
)

By benchmarking and also by looking at the output of -Xprof it looks like updating someState.copy(innocentField = 42) actually performs a deep copy and I observe a significant performance drop when I increase the size of largeMap and largeArray. I was somehow expecting that the newly constructed instance shares the object references of the original state, since internally the reference should just get passed to the constructor. Can I somehow force or disable this deep copy behaviour of the default copy?

While pondering on the copy-on-write issue, I was wondering whether there are more general solutions to this problem in FP, which store changes of immutable data in a kind of incremental way (in the sense of "collecting updates" or "gathering changes"). To my surprise I could not find anything, so I tried the following:

// example state with just two fields
trait State {
  def getName: String
  def getX: Int

  def setName(updated: String): State = new CachedState(this) {
    override def getName: String = updated
  }
  def setX(updated: Int): State = new CachedState(this) {
    override def getX: Int = updated
  }

  // convenient modifiers
  def modName(f: String => String) = setName(f(getName))
  def modX(f: Int => Int) = setX(f(getX))

  def build(): State = new BasicState(getName, getX)
}

// actual (full) implementation of State
class BasicState(
  val getName: String, 
  val getX: Int
) extends State


// CachedState delegates all getters to another state
class CachedState(oldState: State) extends State {
  def getName = oldState.getName
  def getX    = oldState.getX
}

Now this allows to do something like this:

var s: State = new BasicState("hello", 42)

// updating single fields does not copy
s = s.setName("world")
s = s.setX(0)

// after a certain number of "wrappings"
// we can extract (i.e. copy) a normal instance
val ns = s.setName("ok").setX(40).modX(_ + 2).build()

My question now is: What do you think of this design? Is this some kind of FP design pattern that I'm not aware of (apart from the similarity to the Builder pattern)? Since I have not found anything similar, I'm wondering if there is some major issue with this approach? Or are there any more standard ways to solve the copy-on-write bottleneck without giving up immutability?

Is there even a possibility to unify the get/set/mod functions in some way?

Edit:

My assumption that copy performs a deep copy was indeed wrong.

972

asked Nov 06 '12 22:11

bluenote10

2 Answers

This is basically the same as views and is a type of lazy evaluation; this type of strategy is more or less the default in Haskell, and is used in Scala a fair bit (see e.g. mapValues on maps, grouped on collections, pretty much anything on Iterator or Stream that returns another Iterator or Stream, etc.). It is a proven strategy to avoid extra work in the right context.

But I think your premise is somewhat mistaken.

case class Foo(bar: Int, baz: Map[String,Boolean]) {}
Foo(1,Map("fish"->true)).copy(bar = 2)

does not in fact cause the map to be copied deeply. It just sets references. Proof in bytecode:

62: astore_1
63: iconst_2   // This is bar = 2
64: istore_2
65: aload_1
66: invokevirtual   #72; //Method Foo.copy$default$2:()Lscala/collection/immutable/Map;
69: astore_3   // That was baz
70: aload_1
71: iload_2
72: aload_3
73: invokevirtual   #76; //Method Foo.copy:(ILscala/collection/immutable/Map;)LFoo;

And let's see what that copy$default$2 thing does:

0:  aload_0
1:  invokevirtual   #50; //Method baz:()Lscala/collection/immutable/Map;
4:  areturn

Just returns the map.

And copy itself?

0:  new #2; //class Foo
3:  dup
4:  iload_1
5:  aload_2
6:  invokespecial   #44; //Method "<init>":(ILscala/collection/immutable/Map;)V
9:  areturn

Just calls the regular constructor. No cloning of the map.

So when you copy, you create exactly one object--a new copy of what you're copying, with fields filled in. If you have a large number of fields, your view will be faster (as you have to create one new object (two if you use the function application version, since you need to create the function object also) but it has only one field). Otherwise it should be about the same.

So, yes, good idea potentially, but benchmark carefully to be sure it's worth it in your case--you have to write a fair bit of code by hand instead of letting the case class do it all for you.

182

answered Oct 11 '22 12:10

Rex Kerr

I tried to write a (quite rough) test for timing performances on your case class copy operation.

object CopyCase {

    def main(args: Array[String]) = {

        val testSizeLog = byTen(10 #:: Stream[Int]()).take(6).toList
        val testSizeLin = (100 until 1000 by 100) ++ (1000 until 10000 by 1000) ++ (10000 to 40000 by 10000)

        //warmUp
        runTest(testSizeLin)
        //test with logarithmic size increments 
        val times = runTest(testSizeLog)
        //test with linear size increments 
        val timesLin = runTest(testSizeLin)

        times.foreach(println)
        timesLin.foreach(println)
    }

    //The case class to test for copy
    case class State(
        innocentField: Int, 
        largeMap: Map[Int, Int], 
        largeArray: Array[Int]
    )

    //executes the test
    def runTest(sizes: Seq[Int]) = 
        for {
            s <- sizes
            st = State(s, largeMap(s), largeArray(s))
            //(time, state) = takeTime (st.copy(innocentField = 42)) //single run for each size
            (time, state) = mean(st.copy(innocentField = 42))(takeTime) //mean time on multiple runs for each size
        } yield (s, time)

    //Creates the stream of 10^n  with n = Naturals+{0}
    def byTen(s: Stream[Int]): Stream[Int] = s.head #:: byTen(s map (_ * 10))

    //append the execution time to the result
    def takeTime[A](thunk: => A): (Double, A) = {
        import System.{currentTimeMillis => millis, nanoTime => nanos}
        val t0:Double = nanos
        val res = thunk
        val time = ((nanos - t0) / 1000)
        (time, res)
    }

    //does a mean on multiple runs of the first element of the pair 
    def mean[A](thunk: => A)(fun: (=> A) => (Double, A)) = {
        val population = 50
        val mean = ((for (n <- 1 to population) yield fun(thunk)) map (_._1) ).sum / population
        (mean, fun(thunk)._2)
    }

    //Build collections for the requested size
    def largeMap(size: Int) = (for (i <- (1 to size)) yield (i, i)).toMap
    def largeArray(size: Int) = Array.fill(size)(1)
}

On this machine:

CPU: 64bits dual-core-i5 3.10GHz
RAM: 8GB ram
OS: win7
Java: 1.7
Scala: 2.9.2

I have the following results, which looks like pretty regular to me.

(size, millisecs to copy)
(10,0.4347000000000001)
(100,0.4412600000000001)
(1000,0.3953200000000001)
(10000,0.42161999999999994)
(100000,0.4478600000000002)
(1000000,0.42816000000000015)
(100,0.4084399999999999)
(200,0.41494000000000014)
(300,0.42156000000000016)
(400,0.4281799999999999)
(500,0.42160000000000003)
(600,0.4347200000000001)
(700,0.43466000000000016)
(800,0.41498000000000007)
(900,0.40178000000000014)
(1000,0.44134000000000007)
(2000,0.42151999999999995)
(3000,0.42148)
(4000,0.40842)
(5000,0.38860000000000006)
(6000,0.4413600000000001)
(7000,0.4743200000000002)
(8000,0.44795999999999997)
(9000,0.45448000000000005)
(10000,0.45448)
(20000,0.4281600000000001)
(30000,0.46768)
(40000,0.4676200000000001)

Maybe you have different performance measurements in mind.

Or could it be that your profiled times are actually spent on generating the Map and the Array, instead of copying the case class?

answered Oct 11 '22 11:10

pagoda_5b

Related questions
                            
                                Testing with probabilistic failure of components in Akka (Scala)
                            
                                Spark Random Forests: Different results with same seed
                            
                                Polymorphic updates in an immutable class hierarchy
                            
                                Scala / Slick, "Timeout after 20000ms of waiting for a connection" error
                            
                                scala actors vs threads and blocking IO
                            
                                What is the best way to perform OAuth2 authentication using akka-http?
                            
                                Optimizing Slick generated SQL query
                            
                                (SBT) How to disable default resolver and only use the company internal resolver?
                            
                                JVM OutOfMemory error "death spiral" (not memory leak)
                            
                                Enumeration concept in Scala - Which option to take?
                            
                                Spark: efficiency of dataframe checkpoint vs. explicitly writing to disk
                            
                                Getting the value of a SettingKey[T]
                            
                                Validation versus disjunction
                            
                                Function composition, Kleisli arrow, and Monadic laws
                            
                                Does collect_list() maintain relative ordering of rows?
                            
                                How do I access default parameter values via Scala reflection?
                            
                                Play JSON Reads/Writes with single-parameter case classes
                            
                                Wait on asynchronous Future call before processing next message in Akka
                            
                                Maturity of Scala-LLVM? [closed]
                            
                                Scala: order of definition for companion object vs case class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scala: collecting updates/changes of immutable state

Tags:

design-patterns

functional-programming

state

scala

immutability

bluenote10

People also ask

2 Answers

Rex Kerr

pagoda_5b

Recent Activity

Donate For Us