Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Scala 2.12 with Spark 2.x

Tags:

At the Spark 2.1 docs it's mentioned that

Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).

at the Scala 2.12 release news it's also mentioned that:

Although Scala 2.11 and 2.12 are mostly source compatible to facilitate cross-building, they are not binary compatible. This allows us to keep improving the Scala compiler and standard library.

But when I build an uber jar (using Scala 2.12) and run it on Spark 2.1. every thing work just fine.

and I know its not any official source but at the 47 degree blog they mentioned that Spark 2.1 does support Scala 2.12.

How can one explain those (conflicts?) pieces of information ?

like image 535
NetanelRabinowitz Avatar asked Mar 19 '17 13:03

NetanelRabinowitz


People also ask

Is Scala 2.12 backwards compatible?

So Scala 2.11 => Scala 2.12 is a major release. It's not a minor release! Scala major releases are not backwards compatible. Java goes to extreme lengths to maintain backwards compatibility, so Java code that was built with Java 8 can be run with Java 14.

Does Scala 2.11 work with spark3?

Scala 2.12 used by Spark 3 is incompatible with Scala 2.11 If running Spark jobs based on Scala 2.11 jars, it is required to rebuild it using Scala 2.12. Scala 2.11 and 2.12 are mostly source compatible, but not binary compatible.

Is Scala 3 compatible with Spark?

We can already use Scala 3 to build Spark applications thanks to the compatibility between Scala 2.13 and Scala 3.


2 Answers

Spark does not support Scala 2.12. You can follow SPARK-14220 (Build and test Spark against Scala 2.12) to get up to date status.

update: Spark 2.4 added an experimental Scala 2.12 support.

like image 84
user7735456 Avatar answered Dec 12 '22 16:12

user7735456


Scala 2.12 is officially supported (and required) as of Spark 3. Summary:

  • Spark 2.0 - 2.3: Required Scala 2.11
  • Spark 2.4: Supported Scala 2.11 and Scala 2.12, but not really cause almost all runtimes only supported Scala 2.11.
  • Spark 3: Only Scala 2.12 is supported

Using a Spark runtime that's compiled with one Scala version and a JAR file that's compiled with another Scala version is dangerous and causes strange bugs. For example, as noted here, using a Scala 2.11 compiled JAR on a Spark 3 cluster will cause this error: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps.

Look at all the poor Spark users running into this very error.

Make sure to look into Scala cross compilation and understand the %% operator in SBT to limit your suffering. Maintaining Scala projects is hard and minimizing your dependencies is recommended.

like image 33
Powers Avatar answered Dec 12 '22 18:12

Powers