Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EMR spark-shell not picking up jars

I am using spark-shell and I am unable to pick up external jars. I run spark in EMR.

I run the following command:

spark-shell --jars s3://play/emr/release/1.0/code.jar

I get the following error:

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Warning: Skip remote jar s3://play/emr/release/1.0/code.jar

Thanks in advance.

like image 314
user2509471 Avatar asked Feb 25 '16 18:02

user2509471


People also ask

How do you import jars into Spark shell?

You can also add jars using Spark submit option --jar , using this option you can add a single jar or multiple jars by comma-separated.

Where does Spark Look for jars?

x it is in /usr/lib/spark/jars/ . Check this tutorial from AWS to see more info.

How do I start my EMR Spark?

You can access the Spark shell by connecting to the master node with SSH and invoking spark-shell . For more information about connecting to the master node, see Connect to the master node using SSH in the Amazon EMR Management Guide. The following examples use Apache HTTP Server access logs stored in Amazon S3.


1 Answers

This is a limitation of Apache Spark itself, not specifically Spark on EMR. When running Spark in client deploy mode (all interactive shells like spark-shell or pyspark, or spark-submit without --deploy-mode cluster or --master yarn-cluster), only local jar paths are allowed.

The reason for this is that in order for Spark to download this remote jar, it must already be running Java code, at which point it is too late to add the jar to its own classpath.

The workaround is to download the jar locally (using the AWS S3 CLI) then specify the local path when running spark-shell or spark-submit.

like image 170
Jonathan Kelly Avatar answered Sep 20 '22 00:09

Jonathan Kelly