Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark-Shell: Howto define JAR loading order

Running spark-shell locally + define classpath to some 3rd party JARs:

$ spark-shell --driver-class-path /Myproject/LIB/*

Within the shell, I typed

scala> import com.google.common.collect.Lists
<console>:19: error: object collect is not a member of package com.google.common
   import com.google.common.collect.Lists
                            ^

I suppose Spark has loaded first /usr/local/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar which doesn't contain the com.google.common.collect package.

/Myproject/LIB/ contains google-collections-1.0.jar and has the com.google.common.collect. However, this jar seems to be ignored.

Question: How to tell spark-shell to load the JARs in --driver-class-path before those in spark-1.4.0-bin-hadoop2.6/lib/ ?

ANSWER: combining hints from Sathish and Holden comments
--jars must be used instead of --driver-class-path. All jar files must be specified. The jars must be comma-delimited, no space (as per spark-shell --help)

$ spark-shell --jars $(echo ./Myproject/LIB/*.jar | tr ' ' ',')
like image 573
Polymerase Avatar asked Jun 13 '15 04:06

Polymerase


1 Answers

The driver class path flag needs to be comma separated. So ,based on Setting multiple jars in java classpath , we can try spark-shell --driver-class-path $(echo ./Myproject/LIB/*.jar | tr ' ' ',')

like image 134
Holden Avatar answered Sep 22 '22 10:09

Holden