I'm porting an algorithm from Java to Scala that does a range search on a VP Tree. Briefly, the nodes in the tree have coordinates in space and a radius: nodes within that radius can be found on the left subtree, whilst nodes outside that radius are found on the right subtree. A range search attempts to find all objects in the tree within a specified distance to a query object. In Java I passed the function an arraylist in which it accumulated results, possibly recursing down one of either or both subtrees. Here's a straight port into Scala: <pre class="prettyprint"><code>def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double, results: collection.mutable.Set[TObject]) { var dist = distance(query, node.point) if (dist < radius) results += node.obj if (node.left != null && dist <= radius + node.radius) search(node.left, query, radius, results) if (node.right != null && dist >= radius + node.radius) search(node.right, query, radius, results) } </code></pre> Scala's default collection types are immutable, and I was thinking it was a bit annoying having to type <code>collection.mutable.</code> all the time, so I started looking into it. It seems to be recommended that using the immutable collections are nearly always fine: I'm using this code to do millions of lookups though, and it seems to me that copying and concatenating the results array would slow it down. Answers like this for example suggest that the problem needs to be approached more 'functionally'. So, what should I do to solve this problem in a more Scala-esque fashion? Ideally I'd like it to be as fast as the Java version, but I'm interested in solutions regardless (and can always profile them to see if it makes much difference). Note, I only just started learning Scala (figured I may as well cut my teeth on something useful) but I'm not new to functional programming, having used Haskell before (although I don't think I'm that good at it!).

This is what I would consider a more functional approach: <pre class="prettyprint"><code>val emptySet = Set[TObject]() def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double): Set[TObject] = { val dist = distance(query, node.point) val left = Option(node.left) // avoid nulls .filter(_ => dist <= radius + node.radius) // do nothing if predicate fails .fold(emptySet)(l => search(l, query, radius)) // continue your search val right = Option(node.right) .filter(_ => dist >= radius + node.radius) .fold(emptySet)(r => search(r, query, radius)) left ++ right ++ (if (dist < radius) Set(node.obj) else emptySet) } </code></pre> Instead of passing around your <code>mutable.Set</code> to each <code>search</code> function, the <code>search</code> function returns a <code>Set[TObject]</code> which it then concatenates onto other sets. If you were to build up the function calls, it would look like each node of your tree was being concatenated with each other (assuming they were in your radius). From an efficiency perspective, this is probably not as efficient as the mutable version. Using a <code>List</code> instead of a <code>Set</code> would probably be better, and then you can convert the final <code>List</code> to a <code>Set</code> when you're done (though still probably not as fasts as the mutable version). UPDATE To answer your question about the benefits: <ol> <li>Determinism - Since it's immutable you're always guaranteed the same results when calling this function with the same paramaters. With that said, you're original version should be deterministic, you just don't know who else is modifiying your results, though that's probably not much of an issue.</li> <li>Hard to read? - I think that's more a matter of opinion and experience in different styles of programming. I found your version hard to read because you don't return any value from the function and you have multiple if statements. I agree that at first <code>Option</code>/<code>filter</code>/<code>fold</code> can look a bit strange, but after you start using them for awhile (just like anything) it becomes easy to read. I would compare this to being able to read LINQ in .NET.</li> <li>Performance - Using @huynhjl's answer using a <code>List</code> you should get equal if not better performance from your original version. It appears that you don't really need to use <code>Set</code> which has the overhead of making sure everything in the set is unique.</li> <li>Garbage Collection - In the purely functional version you're creating new objects quickly and also dropping them quickly which means they most likely will not survive past the GC's first generation. This is important because moving objects between generations is what forces a GC pause. In the mutable version, you're passing around a reference to the original collection which hangs around longer and may get compacted to the next generation. This isn't exactly the greatest example because your mutable version is probably not that long lived and who knows what you want to do with the return object (maybe keep it around for awhile). In the mutable version you'll most likely end up with a second gen collection pointing to second gen objects, while the immutable version you'll end up with a first gen collection pointing to second gen objects. Cleaning up the immutable version will be much faster and pause-less (again, this is making some broad assumptions and generalizations about the usage of your objects and what the GC is doing, your mileage may vary).</li> <li>Parallelism - The functional version can be easily parallelized, while the mutable version cannot. Depending on the size of your tree this probably isn't a big issue.</li> </ol> Since you seem fairly interested, I would recommend reading Functional Programming in Scala. It goes over all of these basics in what I think is a great way for beginners.

Making more "functional" code in Scala to use immutable collections

Tags:

data-structures

functional-programming

scala

I'm porting an algorithm from Java to Scala that does a range search on a VP Tree. Briefly, the nodes in the tree have coordinates in space and a radius: nodes within that radius can be found on the left subtree, whilst nodes outside that radius are found on the right subtree. A range search attempts to find all objects in the tree within a specified distance to a query object.

In Java I passed the function an arraylist in which it accumulated results, possibly recursing down one of either or both subtrees. Here's a straight port into Scala:

def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double,
    results: collection.mutable.Set[TObject]) {

  var dist = distance(query, node.point)

  if (dist < radius)
    results += node.obj

  if (node.left != null && dist <= radius + node.radius)
    search(node.left, query, radius, results)

  if (node.right != null && dist >= radius + node.radius)
    search(node.right, query, radius, results)
}

Scala's default collection types are immutable, and I was thinking it was a bit annoying having to type collection.mutable. all the time, so I started looking into it. It seems to be recommended that using the immutable collections are nearly always fine: I'm using this code to do millions of lookups though, and it seems to me that copying and concatenating the results array would slow it down.

Answers like this for example suggest that the problem needs to be approached more 'functionally'.

So, what should I do to solve this problem in a more Scala-esque fashion? Ideally I'd like it to be as fast as the Java version, but I'm interested in solutions regardless (and can always profile them to see if it makes much difference).

Note, I only just started learning Scala (figured I may as well cut my teeth on something useful) but I'm not new to functional programming, having used Haskell before (although I don't think I'm that good at it!).

231

asked Aug 17 '13 00:08

sjmeverett

1 Answers

This is what I would consider a more functional approach:

val emptySet = Set[TObject]()

def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double): Set[TObject] = {
  val dist = distance(query, node.point)

  val left = Option(node.left) // avoid nulls
    .filter(_ => dist <= radius + node.radius) // do nothing if predicate fails
    .fold(emptySet)(l => search(l, query, radius)) // continue your search

  val right = Option(node.right)
    .filter(_ => dist >= radius + node.radius)
    .fold(emptySet)(r => search(r, query, radius))

  left ++ right ++ (if (dist < radius) Set(node.obj) else emptySet)
}

Instead of passing around your mutable.Set to each search function, the search function returns a Set[TObject] which it then concatenates onto other sets. If you were to build up the function calls, it would look like each node of your tree was being concatenated with each other (assuming they were in your radius).

From an efficiency perspective, this is probably not as efficient as the mutable version. Using a List instead of a Set would probably be better, and then you can convert the final List to a Set when you're done (though still probably not as fasts as the mutable version).

UPDATE To answer your question about the benefits:

Determinism - Since it's immutable you're always guaranteed the same results when calling this function with the same paramaters. With that said, you're original version should be deterministic, you just don't know who else is modifiying your results, though that's probably not much of an issue.
Hard to read? - I think that's more a matter of opinion and experience in different styles of programming. I found your version hard to read because you don't return any value from the function and you have multiple if statements. I agree that at first Option/filter/fold can look a bit strange, but after you start using them for awhile (just like anything) it becomes easy to read. I would compare this to being able to read LINQ in .NET.
Performance - Using @huynhjl's answer using a List you should get equal if not better performance from your original version. It appears that you don't really need to use Set which has the overhead of making sure everything in the set is unique.
Garbage Collection - In the purely functional version you're creating new objects quickly and also dropping them quickly which means they most likely will not survive past the GC's first generation. This is important because moving objects between generations is what forces a GC pause. In the mutable version, you're passing around a reference to the original collection which hangs around longer and may get compacted to the next generation. This isn't exactly the greatest example because your mutable version is probably not that long lived and who knows what you want to do with the return object (maybe keep it around for awhile). In the mutable version you'll most likely end up with a second gen collection pointing to second gen objects, while the immutable version you'll end up with a first gen collection pointing to second gen objects. Cleaning up the immutable version will be much faster and pause-less (again, this is making some broad assumptions and generalizations about the usage of your objects and what the GC is doing, your mileage may vary).
Parallelism - The functional version can be easily parallelized, while the mutable version cannot. Depending on the size of your tree this probably isn't a big issue.

Since you seem fairly interested, I would recommend reading Functional Programming in Scala. It goes over all of these basics in what I think is a great way for beginners.

147

answered Oct 13 '22 00:10

Noah

Related questions
                            
                                Every "setter" method requires a "getter" method in Scala?
                            
                                Can I use groovy templating instead of scala for Play 2.0 based Java applications?
                            
                                How can I approximate the size of a data structure in scala?
                            
                                Why is there a fold method for Form in Play 2.0.2?
                            
                                Is it safe to nest macro invocations?
                            
                                Is there a quick way to show the code of a method declared in the Scala Console?
                            
                                Scala Play Framework getting user's ip address
                            
                                Scala: how to split using more than one delimiter
                            
                                What's the Scala + sbt workflow equivalent of Ruby + Bundler with a Gemfile?
                            
                                Converting integers to peano numbers using the type system
                            
                                Manually creating actor hierarchy in akka
                            
                                Scala RedBlackTree syntax
                            
                                Scala: idiomatic way to merge list of maps with the greatest value of each key?
                            
                                Passing a individual arguments AND a Seq to a var-arg function
                            
                                Scala deserialization: class not found
                            
                                Multiple flatMap methods for a single monad?
                            
                                Where does Scala store information that cannot be represented in Java?
                            
                                Create a MySQL connection in Playframework with slick
                            
                                How come I can define generic exception types in Scala?
                            
                                using Java generic method from Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With