Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS EMR Spark: Error: Cannot load main class from JAR

I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with:

Cannot load main class from JAR. The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step.

On the local machine, the job seems to work perfectly fine when no main class is specified as below:

 ./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar

I have set main class to jar using run configuration. The main reason to avoid passing main class as --class is, I have to run this job in AWS Datapipeline using EMRAcivity. In AWS Datapipeline, currently there is no way to specify a main class to a job being submitted.

Any help will be appreciated.

like image 638
Atish Avatar asked Oct 17 '22 23:10

Atish


1 Answers

Actually, you can pass the job's main class with EMRActivity and AWS Datapipeline.

See https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html to launch a EMRActivity using step.

as well as https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html to submit a spark job using an EMR step with a main class.

The step would look as follows:

command-runner.jar,spark-submit,--class,org.apache.spark.examples.SparkPi
like image 138
Frederic Avatar answered Oct 21 '22 03:10

Frederic