Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing SPARK_HOME when using SparkLauncher on AWS EMR cluster

I am using EMR 5.0 with Spark 2.0.0. I am trying to run child spark application from Scala spark application using org.apache.spark.launcher.SparkLauncher

I need to set SPARK_HOME using setSparkHome:

 var handle = new SparkLauncher()
    .setAppResource("s3://my-bucket/python_code.py")
    .setAppName("PythonAPP")
    .setMaster("spark://" + sparkSession.conf.get("spark.driver.host") +":"+ sparkSession.conf.get("spark.driver.port"))
    .setVerbose(true)
    .setConf(SparkLauncher.EXECUTOR_CORES, "1")
    .setSparkHome("/srv/spark") // not working
    .setDeployMode("client")
    .startApplication(
      new SparkAppHandle.Listener() {

        override def infoChanged(hndl: SparkAppHandle): Unit = {
          System.out.println(hndl.getState() + " new  state !")
        }

        override def stateChanged(hndl: SparkAppHandle): Unit = {
          System.out.println(hndl.getState() + "    new  state !")
        }
      })

Where can I find the appropriate path to my Spark Home ? The cluster is built from 1 Master, 1 Core and 1 Task servers.

Thanks!

like image 882
Ulile Avatar asked Sep 15 '16 12:09

Ulile


1 Answers

As of emr-4.0.0, all applications on EMR are in /usr/lib. Spark is in /usr/lib/spark.

like image 138
Jonathan Kelly Avatar answered Nov 07 '22 14:11

Jonathan Kelly