Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to look first when optimizing Scala code? [closed]

Tags:

I currently need to optimize a Scala implementation of an algorithm which is too slow. It is implemented in a functional way, uses only values (val) and immutable data structures. I am at the point where I already memoized important functions (so there are a few mutable maps in my code), which made my code twice as fast and I wonder what to do next.

So, I am not looking for generic advice on software optimization (e.g. optimize your algortihm first, use a profiler, do benchmarks...) but rather for Scala-specific or JVM-specific optimization advice.

My question is therefore where to look first when trying to optimize Scala code ? What are the common language constructs or patterns that usually cause slowdowns ?

In particular, I am seeking advice on the following points :

  • I read that for(...) constructs are slow because an anonymous class is generated each and every time the body of the loop is executed. Is it true ? Are there any other places where an anonymous class is generated ? (e.g. when using map() with an anonymous function)
  • Are immutable collections significantly slower than mutable collections in the general case (especially when it comes to map structures) ?
  • Are there any significant performance differences between Scala 2.8, 2.9 and 2.10 ?
like image 533
Xion345 Avatar asked Feb 27 '13 12:02

Xion345


People also ask

How do you optimize an existing code?

Optimize Program Algorithm For any code, you should always allocate some time to think the right algorithm to use. So, the first task is to select and improve the algorithm which will be frequently used in the code. 2. Avoid Type Conversion Whenever possible, plan to use the same type of variables for processing.

What happens in code Optimisation?

Code optimization is any method of code modification to improve code quality and efficiency. A program may be optimized so that it becomes a smaller size, consumes less memory, executes more rapidly, or performs fewer input/output operations.

What is it called when you optimize code?

In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources.

What is object oriented programming language in Scala?

Object- Oriented: Every value in Scala is an object so it is a purely object-oriented programming language. The behavior and type of objects are depicted by the classes and traits in Scala. Functional: It is also a functional programming language as every function is a value and every value is an object.

How to run a Scala program on command line?

Below steps demonstrate how to run a Scala program on Command line in Windows/Unix Operating System: Open Commandline and then to compile the code type scala Hello.scala. If your code has no error then it will execute properly and output will be displayed. Variables are simply a storage location.

What is the easiest way to learn Scala?

Since Scala is a lot similar to other widely used languages syntactically, it is easier to code and learn in Scala. scala programs can be written on any plain text editor like notepad, notepad++, or anything of that sort.

How do you define methods in Scala?

In Scala, methods are defined inside classes (just like Java), but for testing purposes you can also create them in the REPL. This lesson will show some examples of methods so you can see what the syntax looks like.


1 Answers

I also had to optimize a lot of Scala code in the past. The following is not meant to be a complete list, just a few practical observations that might help you:

  • Yes, replacing a for loop by a while is faster, even with Scala 2.10. See the linked talk in the comments for details on that. Also, be aware that using "for filtering" (a condition following the collection you are iterating) will lead to box/unboxing of your condition, which can have a big impact on performance (see this post for details).

  • The question immutable vs. mutable is simply answered by the number of updates you have to perform and it is difficult (for me) to give a general answer here.

  • So far, I did not observe significant performance differences between 2.8, 2.9 and 2.10. But clearly this depends on the problem at hand. For instance, if your algorithm makes heavy use of Range.sum, you will observe big differences (because this is now O(1) in 2.10).

  • I noticed that using the corresponding Java collection instead of the Scala version can also lead to significant speed-ups (as ballpark-figure I would say in the order of 5-10%). For instance, I had the following results (displayed are runtimes) in a microbenchmark for a very specific problem (note: don't generalize from that; run your own).

    ColtBitVector          min:      0.042    avg:      0.245    max:     40.120 JavaBitSet             min:      0.043    avg:      0.165    max:      4.306 JavaHashSet            min:      0.191    avg:      0.716    max:     12.624 JavaTreeSet            min:      0.313    avg:      1.428    max:     64.504 ScalaBitSetImmutable   min:      0.380    avg:      1.675    max:     13.838 ScalaBitSetMutable     min:      0.423    avg:      3.693    max:    457.146 ScalaSetImmutable      min:      0.458    avg:      2.305    max:      9.998 ScalaSetMutable        min:      0.340    avg:      1.332    max:     10.974 

    The problem at hand was to calculate a simple intersection of integer sets (with very specific size and number of sets). What I want to demonstrate: choosing the right/wrong collection can have a significant impact! Again, I think that it is hard to give a general advise which of these data types to choose, since this only tells us the performance in this special intersection problem (but I did chose Java's HashSet in a few over cases over the alternatives). Also, note that this intersection problem does not require a mutable data type. Nevertheless, there can be performance differences even in the immutable functionality (and whereas regarding Set it was the mutable collection which was significantly faster, it is the immutable one for BitSet). Thus, depending on the situation you might want to chose a mutable collection over an immutable for maximum performance (use with care!).

  • I was told that declaring a variable private[this] var foo = ... prevents the creation of getter/setter functions and should be faster (disclaimer: I never confirmed that in a microbenchmark).

  • When dealing with generic types, defining @specialized version for specific types should result in a speed-up.

  • Although I try to avoid generalizations, I could live with the following: Try to use native Arrays. In many of my benchmarks I just ended up using Arrays, which makes sense considering their implementation in the JVM.

  • A minor point that comes to my mind: I observed differences in constructing collections either by origCollection.toSomeCollectionName over manual construction and construction using the companion object (i.e., SomeCollectionName(origCollection :_*)). In many cases the latter was significantly faster.

like image 98
bluenote10 Avatar answered Oct 16 '22 05:10

bluenote10