Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why did Scala's library double its size between 2.7 and 2.8?

Tags:

Comparing Scala 2.7.7 (last 2.7.x release) with Scala 2.8.1 (latest 2.8.x release) I gathered the following metrics:

 Scala version        |    2.7.7          2.8.1                               ------------------------------------------------  Compressed jar file  |   3.6 MB         6.2 MB     Uncompressed files   |   8.3 MB        16.5 MB  .class files in .    |   1.8 MB         1.7 MB    in ./actors        | 554.0 KB         1.3 MB          in ./annotation    |   962  B        11.7 KB     in ./collection    |   2.8 MB         8.8 MB    in ./compat        |   3.8 3B         3.8 KB    in ./concurrent    | 107.3 KB       228.0 KB    in ./io            | 175.7 KB       210.6 KB    in ./math          |    ---         337.5 KB    in ./mobile        |  40.8 KB        47.3 KB    in ./ref           |  21.8 KB        26.5 KB     in ./reflect       | 213.9 KB       940.5 KB    in ./runtime       | 271.0 KB       338.9 KB    in ./testing       |  47.1 KB        53.0 KB    in ./text          |  27.6 KB        34.4 KB    in ./util          |   1.6 MB         1.4 MB           in ./xml           | 738.9 KB         1.1 MB   

The biggest offenders are scala.collection (3.1 times bigger) and scala.reflect (4.4 times bigger). The increase in the collection package is in the same time frame as the big rewrite of the whole collection framework for 2.8, so I guess that's the cause.

I always assumed that the type system magic which computes the best return type of the collection class methods (which was the big change in 2.8) would be done at compile time and won't be visible after that.

  • Why did the rewrite result in such a big increase in size?

As far as I know it is planned to improve scala.io, scala.reflect and scala.swing, there are at least two other actor libraries doing the same than scala.actor (Lift actors) or a lot more (Akka) and scala.testing is officially already superseded by third party testing libraries.

  • Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?

  • Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable modularization system in JDK 8?

  • Are there plans to finally remove scala.testing or split it from the library jar-file?

  • Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?

like image 544
soc Avatar asked Nov 23 '10 13:11

soc


People also ask

What is Scala's programming paradigm?

What is Scala? Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It seamlessly integrates features of object-oriented and functional languages.

Why is Scala so popular?

One of the main reason for the popularity of Scala is Apache Spark (a data-management tool built with Scala). Apache Spark is in fact one of the most popular big data tools for Hadoop integration (fast processing of large amounts of data).

Why was Scala created?

Scala was designed to show that a fusion of functional and object-oriented programming is possible and practical. JAXenter: Were there also milestones in relation to the dissemination of the language? The community around Scala has formed rather fast and important projects and companies adopted it quickly.

What is Scala and why it is used?

Scala (/ˈskɑːlɑː/ SKAH-lah) is a strong statically typed general-purpose programming language which supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are aimed to address criticisms of Java.


2 Answers

Specialization was one factor (about 0.9MB worth of increase in the jar). Another factor are the collection libraries, which now implement a larger set of operations uniformly over a larger set of implementation types. A lot of the increase is only in the bytecodes, because new collection libraries make very heavy use of mixin composition, which tends to increase classfile size. I don't have data on sourcefile size, but I believe the increase there was much smaller.

like image 55
Martin Odersky Avatar answered Sep 29 '22 21:09

Martin Odersky


I'm not in any way associated with the Scala project or any of the companies that support it. So take everything below as my own personal opinion·

  • Why did the rewrite result in such a big increase in size?

Most likely, not the rewrite itself, but specialization. In particular, this definition of Function1:

trait Function1[@specialized(scala.Int, scala.Long, scala.Float, scala.Double) -T1, @specialized(scala.Unit, scala.Boolean, scala.Int, scala.Float, scala.Long, scala.Double) +R] 

means all methods in Function1 will be implemented 35 times (one for each of Int, Long, Float, Double and AnyRef T1 times each Unit, Boolean, Int, Float, Long, Double and AnyRef R.

Now, look at the Scaladoc and see known subclasses for Function1. I won't even bother copying it here. Also specialized where Function0 and Function2, though their impact is much smaller.

If anything, I'd bet the rewrite decreased the final footprint, because of the extensive code reuse it enabled.

As for reflect, it went from being almost non-existent to providing fundamental features to the new collection library, so it is no surprise it had a big relative increase.

  • Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?

Not comparable, because the rewrite had nothing to do with it. However, a true scala.io library would certainly be much bigger than the little that exists nowadays, and I'd expect the same of a true reflection system for Scala (there have been papers about the latter). As for swing, I don't think there's much but incremental improvements to it, mostly wrappers around Java libraries, so I doubt it would change much in size.

  • Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable modularization system in JDK 8?

Each implementation have their own strengths, and I haven't seen any signs of convergence for the time being. As for JDK 8, how is Scala supposed to be compatible with JDK 5 while modularizing for JDK 8? I don't mean it is not possible, but it is quite likely too much effort for the available resources.

  • Are there plans to finally remove scala.testing or split it from the library jar-file?

It has been discussed, but there's also a concern about having some sort of testing framework available for the compiler itself, with the flexibility a third party testing framework would not provide. It might well be moved (or removed and replaced with something else) to the compiler jar instead, though.

  • Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?

Sure, once no one else uses JDK5/JDK6 anymore. Of course, if JDK7/JDK8 get widespread adoption and the improvements are sufficiently worthwhile, then there might well come a time when Scala gets distributed with two distinct jar files for its library. But, at this point, it is too early to conjure up hypothetical scenarios.

like image 36
Daniel C. Sobral Avatar answered Sep 29 '22 23:09

Daniel C. Sobral