Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically including jars to PySpark classpath

I'm trying to automatically include jars to my PySpark classpath. Right now I can type the following command and it works:

$ pyspark --jars /path/to/my.jar

I'd like to have that jar included by default so that I can only type pyspark and also use it in IPython Notebook.

I've read that I can include the argument by setting PYSPARK_SUBMIT_ARGS in env:

export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"

Unfortunately the above doesn't work. I get the runtime error Failed to load class for data source.

Running Spark 1.3.1.

Edit

My workaround when using IPython Notebook is the following:

$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar
like image 843
Kamil Sindi Avatar asked Jul 16 '15 21:07

Kamil Sindi


People also ask

Where do I put .JAR files in spark?

It's better to pass the driver and executor class paths as --conf , which adds them to the Spark session object itself and those paths are reflected in the Spark configuration. But please make sure to put JAR files on the same path across the cluster.

What is JAR file in spark-submit?

Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower based notebooks because they're unfamiliar with JAR files.

What are spark packages?

spark-packages.org is an external, community-managed list of third-party libraries, add-ons, and applications that work with Apache Spark. You can add a package as long as you have a GitHub repository.


1 Answers

You can add the jar files in the spark-defaults.conf file (located in the conf folder of your spark installation). If there is more than one entry in the jars list, use : as separator.

spark.driver.extraClassPath /path/to/my.jar

This property is documented in https://spark.apache.org/docs/1.3.1/configuration.html#runtime-environment

like image 132
Diego Rodríguez Avatar answered Oct 30 '22 01:10

Diego Rodríguez