Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the use of --driver-class-path in the spark command?

Tags:

apache-spark

As per spark docs,

To get started you will need to include the JDBC driver for you particular database on the spark classpath. For example, to connect to postgres from the Spark Shell you would run the following command:

bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

Job is working fine without --driver-class-path. Then, what is the use of --driver-class-path in the spark command?

like image 608
Dev Avatar asked Apr 06 '17 13:04

Dev


People also ask

What is driver memory in Spark?

The spark. driver. memoryOverHead enables you to set the memory utilized by every Spark driver process in cluster mode. This is the memory that accounts for things like VM overheads, interned strings, other native overheads, etc. – it tends to grow with the executor size (typically 6-10%).

What is executor and driver in Spark?

Executors in Spark are the worker nodes that help in running individual tasks by being in charge of a given spark job. These are launched at the beginning of Spark applications, and as soon as the task is run, results are immediately sent to the driver.

How does Spark select driver memory?

Determine the memory resources available for the Spark application. Multiply the cluster RAM size by the YARN utilization percentage. Provides 5 GB RAM for available drivers and 50 GB RAM available for worker nodes. Discount 1 core per worker node to determine the executor core instances.

What is class in Spark submit?

Your spark-submit syntax can be: --class main-class application-jar [application-arguments] --class main-class is the fully qualified name of the class that contains the main method for the Java and Scala application. For SparkPi, the main class would be org.


1 Answers

--driver-class-path or spark.driver.extraClassPath can be used for to modify class path only for the Spark driver. This is useful for libraries which are not required by the executors (for example any code that is used only locally).

Compared to that, --jars or spark.jars will not only add jars to both driver and executor classpath, but also distribute archives over the cluster. If particular jar is used only by the driver this is unnecessary overhead.

like image 99
zero323 Avatar answered Oct 12 '22 22:10

zero323