I'm using a standalone spark cluster, one master and 2 workers. I really don't understand how to use wisely SPARK_CLASSPATH or SparkContext.addJar. I tried both and It looks like addJar doesn't work as I used to believe. In my case I tried to use some joda-time function, in the closures or outside. If I set SPARK_CLASSPATH with a path to the joda-time jar, everything works ok. But if I remove SPARK_CLASSPATH and add in my program: <pre class="prettyprint"><code>JavaSparkContext sc = new JavaSparkContext("spark://localhost:7077", "name", "path-to-spark-home", "path-to-the-job-jar"); sc.addJar("path-to-joda-jar"); </code></pre> It doesn't work anymore, although in logs I can see: <pre class="prettyprint"><code>14/03/17 15:32:57 INFO SparkContext: Added JAR /home/hduser/projects/joda-time-2.1.jar at http://127.0.0.1:46388/jars/joda-time-2.1.jar with timestamp 1395066777041 </code></pre> and immediatly after: <pre class="prettyprint"><code>Caused by: java.lang.NoClassDefFoundError: org/joda/time/DateTime at com.xxx.sparkjava1.SimpleApp.main(SimpleApp.java:57) ... 6 more Caused by: java.lang.ClassNotFoundException: org.joda.time.DateTime at java.net.URLClassLoader$1.run(URLClassLoader.java:366) </code></pre> I used to suppose that SPARK_CLASSPATH was setting the classpath for the driver part of the job, and SparkContext.addJar was setting the classpath for the executors, but It does not seem right anymore. Anyone knows better than me?

<code>SparkContext.addJar</code> is broken in 0.9 as well as <code>ADD_JARS</code> environment variable. It used to work as documented in 0.8.x and the fix is already commited to master, so it's expected in the next release. For now you can either use workaround described in Jira or make patched Spark build. See relevant mailing list discussion: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3C5234E529519F4320A322B80FBCF5BDA6@gmail.com%3E Jira issue: https://spark-project.atlassian.net/plugins/servlet/mobile#issue/SPARK-1089

When to use SPARK_CLASSPATH or SparkContext.addJar

Tags:

apache-spark

I'm using a standalone spark cluster, one master and 2 workers. I really don't understand how to use wisely SPARK_CLASSPATH or SparkContext.addJar. I tried both and It looks like addJar doesn't work as I used to believe.

In my case I tried to use some joda-time function, in the closures or outside. If I set SPARK_CLASSPATH with a path to the joda-time jar, everything works ok. But if I remove SPARK_CLASSPATH and add in my program:

JavaSparkContext sc = new JavaSparkContext("spark://localhost:7077", "name", "path-to-spark-home", "path-to-the-job-jar");
sc.addJar("path-to-joda-jar");

It doesn't work anymore, although in logs I can see:

14/03/17 15:32:57 INFO SparkContext: Added JAR /home/hduser/projects/joda-time-2.1.jar at http://127.0.0.1:46388/jars/joda-time-2.1.jar with timestamp 1395066777041

and immediatly after:

Caused by: java.lang.NoClassDefFoundError: org/joda/time/DateTime
    at com.xxx.sparkjava1.SimpleApp.main(SimpleApp.java:57)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.joda.time.DateTime
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

I used to suppose that SPARK_CLASSPATH was setting the classpath for the driver part of the job, and SparkContext.addJar was setting the classpath for the executors, but It does not seem right anymore.

Anyone knows better than me?

511

asked Mar 17 '14 14:03

VirgileD

1 Answers

SparkContext.addJar is broken in 0.9 as well as ADD_JARS environment variable. It used to work as documented in 0.8.x and the fix is already commited to master, so it's expected in the next release. For now you can either use workaround described in Jira or make patched Spark build.

See relevant mailing list discussion: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%[email protected]%3E

Jira issue: https://spark-project.atlassian.net/plugins/servlet/mobile#issue/SPARK-1089

190

answered Oct 18 '22 16:10

Wildfire

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When to use SPARK_CLASSPATH or SparkContext.addJar

Tags:

apache-spark

VirgileD

People also ask

1 Answers

Wildfire

Recent Activity

Donate For Us

When to use SPARK_CLASSPATH or SparkContext.addJar

Tags:

apache-spark

VirgileD

People also ask

1 Answers

Wildfire

Related questions

Recent Activity

Donate For Us