Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making more "functional" code in Scala to use immutable collections

I'm porting an algorithm from Java to Scala that does a range search on a VP Tree. Briefly, the nodes in the tree have coordinates in space and a radius: nodes within that radius can be found on the left subtree, whilst nodes outside that radius are found on the right subtree. A range search attempts to find all objects in the tree within a specified distance to a query object.

In Java I passed the function an arraylist in which it accumulated results, possibly recursing down one of either or both subtrees. Here's a straight port into Scala:

def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double,
    results: collection.mutable.Set[TObject]) {

  var dist = distance(query, node.point)

  if (dist < radius)
    results += node.obj

  if (node.left != null && dist <= radius + node.radius)
    search(node.left, query, radius, results)

  if (node.right != null && dist >= radius + node.radius)
    search(node.right, query, radius, results)
}

Scala's default collection types are immutable, and I was thinking it was a bit annoying having to type collection.mutable. all the time, so I started looking into it. It seems to be recommended that using the immutable collections are nearly always fine: I'm using this code to do millions of lookups though, and it seems to me that copying and concatenating the results array would slow it down.

Answers like this for example suggest that the problem needs to be approached more 'functionally'.

So, what should I do to solve this problem in a more Scala-esque fashion? Ideally I'd like it to be as fast as the Java version, but I'm interested in solutions regardless (and can always profile them to see if it makes much difference).

Note, I only just started learning Scala (figured I may as well cut my teeth on something useful) but I'm not new to functional programming, having used Haskell before (although I don't think I'm that good at it!).

like image 231
sjmeverett Avatar asked Aug 17 '13 00:08

sjmeverett


People also ask

Does Scala have immutable collections?

Even though the static type of such a collection provides no operations for modifying the collection, it might still be possible that the run-time type is a mutable collection which can be changed by other clients. By default, Scala always picks immutable collections.

How do you make a class immutable in Scala?

Instead of mutating the state, create new state with computed value and return it to the user. In the same way, check for funds and create new state and return it to the user. In functional programming it is very common to create new state instead of trying to mutate the old state.

What is immutable in Scala?

An object whose state cannot change after it has been constructed is called immutable (unchangable). The methods of an immutable object do not modify the state of the object. In Scala, all number types, strings, and tuples are immutable.

Which classes Scala supports for the development of collection classes?

They are supported by class immutable. HashMap. Their representation is similar to vectors in that they are also trees where every node has 32 elements or 32 subtrees. But the selection of these keys is now done based on hash code.


1 Answers

This is what I would consider a more functional approach:

val emptySet = Set[TObject]()

def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double): Set[TObject] = {
  val dist = distance(query, node.point)

  val left = Option(node.left) // avoid nulls
    .filter(_ => dist <= radius + node.radius) // do nothing if predicate fails
    .fold(emptySet)(l => search(l, query, radius)) // continue your search

  val right = Option(node.right)
    .filter(_ => dist >= radius + node.radius)
    .fold(emptySet)(r => search(r, query, radius))

  left ++ right ++ (if (dist < radius) Set(node.obj) else emptySet)
}

Instead of passing around your mutable.Set to each search function, the search function returns a Set[TObject] which it then concatenates onto other sets. If you were to build up the function calls, it would look like each node of your tree was being concatenated with each other (assuming they were in your radius).

From an efficiency perspective, this is probably not as efficient as the mutable version. Using a List instead of a Set would probably be better, and then you can convert the final List to a Set when you're done (though still probably not as fasts as the mutable version).

UPDATE To answer your question about the benefits:

  1. Determinism - Since it's immutable you're always guaranteed the same results when calling this function with the same paramaters. With that said, you're original version should be deterministic, you just don't know who else is modifiying your results, though that's probably not much of an issue.
  2. Hard to read? - I think that's more a matter of opinion and experience in different styles of programming. I found your version hard to read because you don't return any value from the function and you have multiple if statements. I agree that at first Option/filter/fold can look a bit strange, but after you start using them for awhile (just like anything) it becomes easy to read. I would compare this to being able to read LINQ in .NET.
  3. Performance - Using @huynhjl's answer using a List you should get equal if not better performance from your original version. It appears that you don't really need to use Set which has the overhead of making sure everything in the set is unique.
  4. Garbage Collection - In the purely functional version you're creating new objects quickly and also dropping them quickly which means they most likely will not survive past the GC's first generation. This is important because moving objects between generations is what forces a GC pause. In the mutable version, you're passing around a reference to the original collection which hangs around longer and may get compacted to the next generation. This isn't exactly the greatest example because your mutable version is probably not that long lived and who knows what you want to do with the return object (maybe keep it around for awhile). In the mutable version you'll most likely end up with a second gen collection pointing to second gen objects, while the immutable version you'll end up with a first gen collection pointing to second gen objects. Cleaning up the immutable version will be much faster and pause-less (again, this is making some broad assumptions and generalizations about the usage of your objects and what the GC is doing, your mileage may vary).
  5. Parallelism - The functional version can be easily parallelized, while the mutable version cannot. Depending on the size of your tree this probably isn't a big issue.

Since you seem fairly interested, I would recommend reading Functional Programming in Scala. It goes over all of these basics in what I think is a great way for beginners.

like image 147
Noah Avatar answered Oct 13 '22 00:10

Noah