Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: collecting updates/changes of immutable state

I'm currently trying to apply a more functional programming style to a project involving low-level (LWJGL-based) GUI development. Obviously, in such a case it is necessary to carry around a lot of state, which is mutable in the current version. My goal is to eventually have a completely immutable state, in order to avoid state changes as side effect. I studied scalaz's lenses and state monads for awhile, but my main concern remains: All these techniques rely on copy-on-write. Since my state has both a large number of fields and also some fields of considerable size, I'm worried about performance.

To my knowledge the most common approach to modify immutable objects is to use the generated copy method of a case class (this is also what lenses do under the hood). My first question is, how this copy method is actually implemented? I performed a few experiments with a class like:

case class State(
  innocentField: Int, 
  largeMap: Map[Int, Int], 
  largeArray: Array[Int]
)

By benchmarking and also by looking at the output of -Xprof it looks like updating someState.copy(innocentField = 42) actually performs a deep copy and I observe a significant performance drop when I increase the size of largeMap and largeArray. I was somehow expecting that the newly constructed instance shares the object references of the original state, since internally the reference should just get passed to the constructor. Can I somehow force or disable this deep copy behaviour of the default copy?

While pondering on the copy-on-write issue, I was wondering whether there are more general solutions to this problem in FP, which store changes of immutable data in a kind of incremental way (in the sense of "collecting updates" or "gathering changes"). To my surprise I could not find anything, so I tried the following:

// example state with just two fields
trait State {
  def getName: String
  def getX: Int

  def setName(updated: String): State = new CachedState(this) {
    override def getName: String = updated
  }
  def setX(updated: Int): State = new CachedState(this) {
    override def getX: Int = updated
  }

  // convenient modifiers
  def modName(f: String => String) = setName(f(getName))
  def modX(f: Int => Int) = setX(f(getX))

  def build(): State = new BasicState(getName, getX)
}

// actual (full) implementation of State
class BasicState(
  val getName: String, 
  val getX: Int
) extends State


// CachedState delegates all getters to another state
class CachedState(oldState: State) extends State {
  def getName = oldState.getName
  def getX    = oldState.getX
}

Now this allows to do something like this:

var s: State = new BasicState("hello", 42)

// updating single fields does not copy
s = s.setName("world")
s = s.setX(0)

// after a certain number of "wrappings"
// we can extract (i.e. copy) a normal instance
val ns = s.setName("ok").setX(40).modX(_ + 2).build()

My question now is: What do you think of this design? Is this some kind of FP design pattern that I'm not aware of (apart from the similarity to the Builder pattern)? Since I have not found anything similar, I'm wondering if there is some major issue with this approach? Or are there any more standard ways to solve the copy-on-write bottleneck without giving up immutability?

Is there even a possibility to unify the get/set/mod functions in some way?

Edit:

My assumption that copy performs a deep copy was indeed wrong.

like image 972
bluenote10 Avatar asked Nov 06 '12 22:11

bluenote10


People also ask

Why is everything in Scala immutable?

Immutable objects and data structures are first-class citizens in Scala. This is because they prevent mistakes in distributed systems and provide thread-safe data.

Is everything in Scala immutable?

In Scala, all number types, strings, and tuples are immutable. The classes Point, Date, Student, and Card we defined above are all immutable. In other words, once a Point object has been created, its fields cannot be modified.

Is list mutable or immutable in Scala?

Lists are immutable whereas arrays are mutable in Scala.

How can we convert mutable map to immutable Scala?

If you just want a mutable HashMap , you can just use x. toMap in 2.8 or collection. immutable. Map(x.


2 Answers

This is basically the same as views and is a type of lazy evaluation; this type of strategy is more or less the default in Haskell, and is used in Scala a fair bit (see e.g. mapValues on maps, grouped on collections, pretty much anything on Iterator or Stream that returns another Iterator or Stream, etc.). It is a proven strategy to avoid extra work in the right context.

But I think your premise is somewhat mistaken.

case class Foo(bar: Int, baz: Map[String,Boolean]) {}
Foo(1,Map("fish"->true)).copy(bar = 2)

does not in fact cause the map to be copied deeply. It just sets references. Proof in bytecode:

62: astore_1
63: iconst_2   // This is bar = 2
64: istore_2
65: aload_1
66: invokevirtual   #72; //Method Foo.copy$default$2:()Lscala/collection/immutable/Map;
69: astore_3   // That was baz
70: aload_1
71: iload_2
72: aload_3
73: invokevirtual   #76; //Method Foo.copy:(ILscala/collection/immutable/Map;)LFoo;

And let's see what that copy$default$2 thing does:

0:  aload_0
1:  invokevirtual   #50; //Method baz:()Lscala/collection/immutable/Map;
4:  areturn

Just returns the map.

And copy itself?

0:  new #2; //class Foo
3:  dup
4:  iload_1
5:  aload_2
6:  invokespecial   #44; //Method "<init>":(ILscala/collection/immutable/Map;)V
9:  areturn

Just calls the regular constructor. No cloning of the map.

So when you copy, you create exactly one object--a new copy of what you're copying, with fields filled in. If you have a large number of fields, your view will be faster (as you have to create one new object (two if you use the function application version, since you need to create the function object also) but it has only one field). Otherwise it should be about the same.

So, yes, good idea potentially, but benchmark carefully to be sure it's worth it in your case--you have to write a fair bit of code by hand instead of letting the case class do it all for you.

like image 182
Rex Kerr Avatar answered Oct 11 '22 12:10

Rex Kerr


I tried to write a (quite rough) test for timing performances on your case class copy operation.

object CopyCase {

    def main(args: Array[String]) = {

        val testSizeLog = byTen(10 #:: Stream[Int]()).take(6).toList
        val testSizeLin = (100 until 1000 by 100) ++ (1000 until 10000 by 1000) ++ (10000 to 40000 by 10000)

        //warmUp
        runTest(testSizeLin)
        //test with logarithmic size increments 
        val times = runTest(testSizeLog)
        //test with linear size increments 
        val timesLin = runTest(testSizeLin)

        times.foreach(println)
        timesLin.foreach(println)
    }

    //The case class to test for copy
    case class State(
        innocentField: Int, 
        largeMap: Map[Int, Int], 
        largeArray: Array[Int]
    )

    //executes the test
    def runTest(sizes: Seq[Int]) = 
        for {
            s <- sizes
            st = State(s, largeMap(s), largeArray(s))
            //(time, state) = takeTime (st.copy(innocentField = 42)) //single run for each size
            (time, state) = mean(st.copy(innocentField = 42))(takeTime) //mean time on multiple runs for each size
        } yield (s, time)

    //Creates the stream of 10^n  with n = Naturals+{0}
    def byTen(s: Stream[Int]): Stream[Int] = s.head #:: byTen(s map (_ * 10))

    //append the execution time to the result
    def takeTime[A](thunk: => A): (Double, A) = {
        import System.{currentTimeMillis => millis, nanoTime => nanos}
        val t0:Double = nanos
        val res = thunk
        val time = ((nanos - t0) / 1000)
        (time, res)
    }

    //does a mean on multiple runs of the first element of the pair 
    def mean[A](thunk: => A)(fun: (=> A) => (Double, A)) = {
        val population = 50
        val mean = ((for (n <- 1 to population) yield fun(thunk)) map (_._1) ).sum / population
        (mean, fun(thunk)._2)
    }

    //Build collections for the requested size
    def largeMap(size: Int) = (for (i <- (1 to size)) yield (i, i)).toMap
    def largeArray(size: Int) = Array.fill(size)(1)
}

On this machine:

  • CPU: 64bits dual-core-i5 3.10GHz
  • RAM: 8GB ram
  • OS: win7
  • Java: 1.7
  • Scala: 2.9.2

I have the following results, which looks like pretty regular to me.

(size, millisecs to copy)
(10,0.4347000000000001)
(100,0.4412600000000001)
(1000,0.3953200000000001)
(10000,0.42161999999999994)
(100000,0.4478600000000002)
(1000000,0.42816000000000015)
(100,0.4084399999999999)
(200,0.41494000000000014)
(300,0.42156000000000016)
(400,0.4281799999999999)
(500,0.42160000000000003)
(600,0.4347200000000001)
(700,0.43466000000000016)
(800,0.41498000000000007)
(900,0.40178000000000014)
(1000,0.44134000000000007)
(2000,0.42151999999999995)
(3000,0.42148)
(4000,0.40842)
(5000,0.38860000000000006)
(6000,0.4413600000000001)
(7000,0.4743200000000002)
(8000,0.44795999999999997)
(9000,0.45448000000000005)
(10000,0.45448)
(20000,0.4281600000000001)
(30000,0.46768)
(40000,0.4676200000000001)

Maybe you have different performance measurements in mind.

Or could it be that your profiled times are actually spent on generating the Map and the Array, instead of copying the case class?

like image 3
pagoda_5b Avatar answered Oct 11 '22 11:10

pagoda_5b