The page here (http://spark.apache.org/docs/latest/programming-guide.html) indicates packages can be included when the shell is launched via:
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
What is the syntax for including local packages (that are downloaded manually say)? Something to do with Maven coords?
The first step is to download Spark from this link (in my case I put it in the home directory). Then unzip the folder using command line, or right clicking on the *. tar file. The following figure shows my unzipped folder, from where I would run Spark.
To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor. You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.
If the jars are present on the master/workers, you simply need to specify them on the classpath in spark-submit:
spark-shell \
spark.driver.extraClassPath="/path/to/jar/spark-csv_2.11.jar" \
spark.executor.extraClassPath="spark-csv_2.11.jar"
If the jars are only present in the Master, and you want them to be sent to the worker (only works for client mode), you can add the --jars
flag:
spark-shell \
spark.driver.extraClassPath="/path/to/jar/spark-csv_2.11.jar" \
spark.executor.extraClassPath="spark-csv_2.11.jar" \
--jars "/path/to/jar/jary.jar:/path/to/other/other.jar"
For a more elaborated answer see Add jars to a Spark Job - spark-submit
Please use:
./spark-shell --jars my_jars_to_be_included
There is a open question related to this: Please check this question out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With