I have a large scala code base. (https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL) It's like 70K lines of scala code. We are on scala 2.11.7 Development is getting difficult because compilation - the edit-compile-test-debug cycle is too long for small changes. Incremental recompile times can be a minute, and this is without optimization turned on. Sometimes longer. And that's with not having edited very many changes into files. Sometimes a very small change causes a huge recompilation. So my question: What can I do by way of organizing the code, that will improve compilation time? E.g., decomposing code into smaller files? Will this help? E.g., more smaller libraries? E.g., avoiding use of implicits? (we have very few) E.g., avoiding use of traits? (we have tons) E.g., avoiding lots of imports? (we have tons - package boundaries are pretty chaotic at this point) Or is there really nothing much I can do about this? I feel like this very long compilation is somehow due to some immense amount of recompiling due to dependencies, and I am thinking of how to reduce false dependencies....but that's just a theory I'm hoping someone else can shed some light on something we might do which would improve compilation speed for incremental changes.

<blockquote> Here are the phases of the scala compiler, along with slightly edited versions of their comments from the source code. Note that this compiler is unusual in being heavily weighted towards type checking and to transformations that are more like desugarings. Other compilers include a lot of code for: optimization, register allocation, and translation to IR. Some top-level points: There is a lot of tree rewriting. Each phase tends to read in a tree from the previous phase and transform it to a new tree. Symbols, to contrast, remain meaningful throughout the life of the compiler. So trees hold pointers to symbols, and not vice versa. Instead of rewriting symbols, new information gets attached to them as the phases progress. Here is the list of phases from Global: </blockquote> <pre class="prettyprint"><code> analyzer.namerFactory: SubComponent, analyzer.typerFactory: SubComponent, superAccessors, // add super accessors pickler, // serializes symbol tables refchecks, // perform reference and override checking, translate nested objects liftcode, // generate reified trees uncurry, // uncurry, translate function values to anonymous classes tailCalls, // replace tail calls by jumps explicitOuter, // replace C.this by explicit outer pointers, eliminate pattern matching erasure, // erase generic types to Java 1.4 types, add interfaces for traits lambdaLift, // move nested functions to top level constructors, // move field definitions into constructors flatten, // get rid of inner classes mixer, // do mixin composition cleanup, // some platform-specific cleanups genicode, // generate portable intermediate code inliner, // optimization: do inlining inlineExceptionHandlers, // optimization: inline exception handlers closureElimination, // optimization: get rid of uncalled closures deadCode, // optimization: get rid of dead cpde if (forMSIL) genMSIL else genJVM, // generate .class files </code></pre> some work around with scala compiler Thus scala compiler has to do a lot more work than the Java compiler, however in particular there are some things which makes the Scala compiler drastically slower, which include <ul> <li> Implicit resolution. Implicit resolution (i.e. scalac trying to find an implicit value when you make an implicit declartion) bubbles up over every parent scope in the declaration, this search time can be massive (particularly if you reference the same the same implicit variable many times, and its declared in some library all the way down your dependancy chain). The compile time gets even worse when you take into account implicit trait resolution and type classes, which is used heavily by libraries such as scalaz and shapeless. Also using a huge number of anonymous classes (i.e. lambdas, blocks, anonymous functions).Macros obviously add to compile time. A very nice writeup by Martin Odersky Further the Java and Scala compilers convert source code into JVM bytecode and do very little optimization.On most modern JVMs, once the program bytecode is run, it is converted into machine code for the computer architecture on which it is being run. This is called the just-in-time compilation. The level of code optimization is, however, low with just-in-time compilation, since it has to be fast. To avoid recompiling, the so called HotSpot compiler only optimizes parts of the code which are executed frequently. A program might have different performance each time it is run. Executing the same piece of code (e.g. a method) multiple times in the same JVM instance might give very different performance results depending on whether the particular code was optimized in between the runs. Additionally, measuring the execution time of some piece of code may include the time during which the JIT compiler itself was performing the optimization, thus giving inconsistent results. One common cause of a performance deterioration is also boxing and unboxing that happens implicitly when passing a primitive type as an argument to a generic method and also frequent GC. There are several approaches to avoid the above effects during measurement,like It should be run using the server version of the HotSpot JVM, which does more aggressive optimizations.Visualvm is a great choice for profiling a JVM application. It’s a visual tool integrating several command line JDK tools and lightweight profiling capabilities.However scala abstracions are very complex and unfortunately VisualVM does not yet support this.parsing mechanisms which was taking a long time to process like cause using a lot of <code>exists</code> and <code>forall</code> which are methods of Scala collections which take predicates,predicates to FOL and thus may pass entire sequence maximizing performance. Also making the modules cohisive and less dependent is a viable solution.Mind that intermediate code gen is somtimes machine dependent and various architechures give varied results. An Alternative:Typesafe has released Zinc which separates the fast incremental compiler from sbt and lets the maven/other build tools use it. Thus using Zinc with the scala maven plugin has made compiling a lot faster. A simple problem: Given a list of integers, remove the greatest one. Ordering is not necessary. </li> </ul> Below is version of the solution (An average I guess). <pre class="prettyprint"><code>def removeMaxCool(xs: List[Int]) = { val maxIndex = xs.indexOf(xs.max); xs.take(maxIndex) ::: xs.drop(maxIndex+1) } </code></pre> It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times. Now consider this , Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write. <pre class="prettyprint"><code>def removeMaxFast(xs: List[Int]) = { var res = ArrayBuffer[Int]() var max = xs.head var first = true; for (x <- xs) { if (first) { first = false; } else { if (x > max) { res.append(max) max = x } else { res.append(x) } } } res.toList } </code></pre> Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once! So trade-offs should also be prioritized and sometimes you may have to work things like a java developer if none else.

Some ideas that might help - depends on your case and style of development: <ul> <li>Use incremental compilation <code>~compile</code> in SBT or provided by your IDE.</li> <li>Use sbt-revolver and maybe JRebel to reload your app faster. Better suited for web apps.</li> <li>Use TDD - rather than running and debugging the whole app write tests and only run those.</li> <li>Break your project down into libraries/JARs. Use them as dependencies via your build tool: SBT/Maven/etc. Or a variation of this next...</li> <li>Break your project into subprojects (SBT). Compile separately what's needed or root project if you need everything. Incremental compilation is still available.</li> <li>Break your project down to microservices.</li> <li>Wait for Dotty to solve your problem to some degree.</li> <li>If everything fails don't use advanced Scala features that make compilation slower: implicits, metaprogramming, etc.</li> <li>Don't forget to check that you are allocating enough memory and CPU for your Scala compiler. I haven't tried it, but maybe you can use RAM disk instead of HDD for your sources and compile artifacts (easy on Linux).</li> </ul>

What can I do to my scala code so it will compile faster?

Tags:

dependencies

compilation

scala

I have a large scala code base. (https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL)

It's like 70K lines of scala code. We are on scala 2.11.7

Development is getting difficult because compilation - the edit-compile-test-debug cycle is too long for small changes.

Incremental recompile times can be a minute, and this is without optimization turned on. Sometimes longer. And that's with not having edited very many changes into files. Sometimes a very small change causes a huge recompilation.

So my question: What can I do by way of organizing the code, that will improve compilation time?

E.g., decomposing code into smaller files? Will this help?

E.g., more smaller libraries?

E.g., avoiding use of implicits? (we have very few)

E.g., avoiding use of traits? (we have tons)

E.g., avoiding lots of imports? (we have tons - package boundaries are pretty chaotic at this point)

Or is there really nothing much I can do about this?

I feel like this very long compilation is somehow due to some immense amount of recompiling due to dependencies, and I am thinking of how to reduce false dependencies....but that's just a theory

I'm hoping someone else can shed some light on something we might do which would improve compilation speed for incremental changes.

747

asked Nov 04 '16 00:11

Mike Beckerle

2 Answers

Here are the phases of the scala compiler, along with slightly edited versions of their comments from the source code. Note that this compiler is unusual in being heavily weighted towards type checking and to transformations that are more like desugarings. Other compilers include a lot of code for: optimization, register allocation, and translation to IR.

Some top-level points: There is a lot of tree rewriting. Each phase tends to read in a tree from the previous phase and transform it to a new tree. Symbols, to contrast, remain meaningful throughout the life of the compiler. So trees hold pointers to symbols, and not vice versa. Instead of rewriting symbols, new information gets attached to them as the phases progress.

Here is the list of phases from Global:

 analyzer.namerFactory: SubComponent,     analyzer.typerFactory: SubComponent,     superAccessors,  // add super accessors     pickler,         // serializes symbol tables     refchecks,       // perform reference and override checking, translate nested objects     liftcode,        // generate reified trees     uncurry,         // uncurry, translate function values to anonymous classes     tailCalls,       // replace tail calls by jumps     explicitOuter,   // replace C.this by explicit outer pointers, eliminate pattern matching     erasure,         // erase generic types to Java 1.4 types, add interfaces for traits     lambdaLift,      // move nested functions to top level     constructors,    // move field definitions into constructors     flatten,         // get rid of inner classes     mixer,           // do mixin composition     cleanup,         // some platform-specific cleanups     genicode,        // generate portable intermediate code     inliner,         // optimization: do inlining     inlineExceptionHandlers, // optimization: inline exception handlers     closureElimination, // optimization: get rid of uncalled closures     deadCode,           // optimization: get rid of dead cpde     if (forMSIL) genMSIL else genJVM, // generate .class files

some work around with scala compiler

Thus scala compiler has to do a lot more work than the Java compiler, however in particular there are some things which makes the Scala compiler drastically slower, which include

Implicit resolution. Implicit resolution (i.e. scalac trying to find an implicit value when you make an implicit declartion) bubbles up over every parent scope in the declaration, this search time can be massive (particularly if you reference the same the same implicit variable many times, and its declared in some library all the way down your dependancy chain). The compile time gets even worse when you take into account implicit trait resolution and type classes, which is used heavily by libraries such as scalaz and shapeless. Also using a huge number of anonymous classes (i.e. lambdas, blocks, anonymous functions).Macros obviously add to compile time.
A very nice writeup by Martin Odersky

Further the Java and Scala compilers convert source code into JVM bytecode and do very little optimization.On most modern JVMs, once the program bytecode is run, it is converted into machine code for the computer architecture on which it is being run. This is called the just-in-time compilation. The level of code optimization is, however, low with just-in-time compilation, since it has to be fast. To avoid recompiling, the so called HotSpot compiler only optimizes parts of the code which are executed frequently.

A program might have different performance each time it is run. Executing the same piece of code (e.g. a method) multiple times in the same JVM instance might give very different performance results depending on whether the particular code was optimized in between the runs. Additionally, measuring the execution time of some piece of code may include the time during which the JIT compiler itself was performing the optimization, thus giving inconsistent results.

One common cause of a performance deterioration is also boxing and unboxing that happens implicitly when passing a primitive type as an argument to a generic method and also frequent GC.

There are several approaches to avoid the above effects during measurement,like It should be run using the server version of the HotSpot JVM, which does more aggressive optimizations.Visualvm is a great choice for profiling a JVM application. It’s a visual tool integrating several command line JDK tools and lightweight profiling capabilities.However scala abstracions are very complex and unfortunately VisualVM does not yet support this.parsing mechanisms which was taking a long time to process like cause using a lot of exists and forall which are methods of Scala collections which take predicates,predicates to FOL and thus may pass entire sequence maximizing performance.

Also making the modules cohisive and less dependent is a viable solution.Mind that intermediate code gen is somtimes machine dependent and various architechures give varied results.

An Alternative:Typesafe has released Zinc which separates the fast incremental compiler from sbt and lets the maven/other build tools use it. Thus using Zinc with the scala maven plugin has made compiling a lot faster.

A simple problem: Given a list of integers, remove the greatest one. Ordering is not necessary.

Below is version of the solution (An average I guess).

def removeMaxCool(xs: List[Int]) = {   val maxIndex = xs.indexOf(xs.max);   xs.take(maxIndex) ::: xs.drop(maxIndex+1) }

It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.

Now consider this , Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.

def removeMaxFast(xs: List[Int]) = {     var res = ArrayBuffer[Int]()     var max = xs.head     var first = true;        for (x <- xs) {         if (first) {             first = false;         } else {             if (x > max) {                 res.append(max)                 max = x             } else {                 res.append(x)             }         }     }     res.toList }

Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!

So trade-offs should also be prioritized and sometimes you may have to work things like a java developer if none else.

102

answered Sep 19 '22 02:09

khakishoiab

Some ideas that might help - depends on your case and style of development:

Use incremental compilation ~compile in SBT or provided by your IDE.
Use sbt-revolver and maybe JRebel to reload your app faster. Better suited for web apps.
Use TDD - rather than running and debugging the whole app write tests and only run those.
Break your project down into libraries/JARs. Use them as dependencies via your build tool: SBT/Maven/etc. Or a variation of this next...
Break your project into subprojects (SBT). Compile separately what's needed or root project if you need everything. Incremental compilation is still available.
Break your project down to microservices.
Wait for Dotty to solve your problem to some degree.
If everything fails don't use advanced Scala features that make compilation slower: implicits, metaprogramming, etc.
Don't forget to check that you are allocating enough memory and CPU for your Scala compiler. I haven't tried it, but maybe you can use RAM disk instead of HDD for your sources and compile artifacts (easy on Linux).

answered Sep 23 '22 02:09

yǝsʞǝla

Related questions
                            
                                Substantial Android development in Scala [closed]
                            
                                Using Scalaz Stream for parsing task (replacing Scalaz Iteratees)
                            
                                Why is PartialFunction <: Function in Scala?
                            
                                What is coming up for scala in 2.10? [closed]
                            
                                How to mix-in a trait to instance?
                            
                                How are Java threads heavy compared to Scala / Akka actors?
                            
                                Noise free JSON format for sealed traits with Play 2.2 library
                            
                                Should exceptions be case classes?
                            
                                How to convert String to date time in Scala?
                            
                                How to use switch/case (simple pattern matching) in Scala?
                            
                                Does Scala have guards?
                            
                                forall in Scala
                            
                                Joining Spark dataframes on the key
                            
                                Compose and andThen methods
                            
                                Using Scala from Java: passing functions as parameters
                            
                                Identify and describe Scala's generic type constraints
                            
                                Comparing collection contents with ScalaTest
                            
                                Why is foreach better than get for Scala Options?
                            
                                How to implement Map with default operation in Scala
                            
                                Why aren't static methods considered good OO practice? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With