Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running scala 2.12 on emr 5.29.0

I have a jar file compiled in scala 2.12 and now I want to run it on emr 5.29.0. How do I run them as the default version of emr 5.29.0 is scala 2.11.

like image 829
Ram Avatar asked Feb 25 '20 18:02

Ram


People also ask

Can Spark MLlib run on EMR?

We've found great success using popular open source frameworks like Spark and MLlib to learn models at massive scale. The advantages of using these tools are further amplified by relying on AWS and EMR, specifically, to create and manage our clusters.

How do I use Spark code on EMR?

To submit a Spark step using the consoleOpen the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/ . In the Cluster List, choose the name of your cluster. Scroll to the Steps section and expand it, then choose Add step.

How do I use Scala shell with Amazon EMR?

You can use the Scala Shell by following the procedure below. Log in to the master node using SSH as described in Connect to the master node using SSH. In Amazon EMR version 5.5.0 and later, you can use the following command to start a Yarn cluster for the Scala Shell with one TaskManager.

What version of Scala should I use with spark?

The Scala version you should use depends on the version of Spark installed on your cluster. For example, EMR Release 5.30.1 uses Spark 2.4.5, which is built with Scala 2.11. If your cluster uses EMR version 5.30.1, use Spark dependencies for Scala 2.11.

What version of Spark is used in EMR?

For example, EMR Release 5.30.1 uses Spark 2.4.5, which is built with Scala 2.11. If your cluster uses EMR version 5.30.1, use Spark dependencies for Scala 2.11.

Can I run a spark job on Amazon EMR?

This tutorial is for Spark developper’s who don’t have any knowledge on Amazon Web Services and want to learn an easy and quick way to run a Spark job on Amazon EMR.


1 Answers

As per this thread in AWS Forum, all Spark versions on EMR are built with Scala 2.11 as it's the stable version:

On EMR, Spark is built with Scala-2.11.x, which is currently the stable version. As per- https://spark.apache.org/releases/spark-release-2-4-0.html , Scala-2.12 is still under experimental support. Our service team is already aware of this feature request, and they shall be adding Scala-2.12.0 support in coming releases, once it becomes stable.

So you'll have to wait until they add support on future EMR releases or you may want to build a Spark with Scala 2.12 and install it on EMR. See Building and Deploying Custom Applications with Apache Bigtop and Amazon EMR and Building a Spark Distribution for EMR.

UPDATE:

Since Release 6.0.0, Scala 2.12 can be used with Spark on EMR:

Changes, Enhancements, and Resolved Issues

  • Scala

    Scala 2.12 is used with Apache Spark and Apache Livy.

like image 200
blackbishop Avatar answered Oct 02 '22 10:10

blackbishop