I am using spark-shell
and I am unable to pick up external jars
. I run spark
in EMR.
I run the following command:
spark-shell --jars s3://play/emr/release/1.0/code.jar
I get the following error:
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Warning: Skip remote jar s3://play/emr/release/1.0/code.jar
Thanks in advance.
You can also add jars using Spark submit option --jar , using this option you can add a single jar or multiple jars by comma-separated.
x it is in /usr/lib/spark/jars/ . Check this tutorial from AWS to see more info.
You can access the Spark shell by connecting to the master node with SSH and invoking spark-shell . For more information about connecting to the master node, see Connect to the master node using SSH in the Amazon EMR Management Guide. The following examples use Apache HTTP Server access logs stored in Amazon S3.
This is a limitation of Apache Spark itself, not specifically Spark on EMR. When running Spark in client deploy mode (all interactive shells like spark-shell
or pyspark
, or spark-submit
without --deploy-mode cluster
or --master yarn-cluster
), only local jar paths are allowed.
The reason for this is that in order for Spark to download this remote jar, it must already be running Java code, at which point it is too late to add the jar to its own classpath.
The workaround is to download the jar locally (using the AWS S3 CLI) then specify the local path when running spark-shell or spark-submit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With