What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?

Question

What will be difference and use of all these?

spark.local.ip
spark.driver.host
spark.driver.bindAddress
spark.driver.hostname

How to fix a machine as a Driver in Spark standalone cluster ?

cdarlint · Accepted Answer

Short Version

the ApplicationMaster connect to spark Driver by spark.driver.host

spark Driver bind to bindAddress on the client machine

by examples

1 example of port binding

.config('spark.driver.port','50243')

then netstat -ano on windows

TCP    172.18.1.194:50243     0.0.0.0:0              LISTENING       15332
TCP    172.18.1.194:50243     172.18.7.122:54451     ESTABLISHED     15332
TCP    172.18.1.194:50243     172.18.7.124:37412     ESTABLISHED     15332
TCP    172.18.1.194:50243     172.18.7.142:41887     ESTABLISHED     15332
TCP    [::]:4040              [::]:0                 LISTENING       15332

The nodes in the cluster 172.18.7.1xx are in the same network as my development machine 172.181.1.194 as my netmask is 255.255.248.0

2 example of specify ip from ApplicationMaster to Driver

.config('spark.driver.host','192.168.132.1')

then netstat -ano

TCP    192.168.132.1:58555    0.0.0.0:0              LISTENING       9480
TCP    192.168.132.1:58641    0.0.0.0:0              LISTENING       9480
TCP    [::]:4040              [::]:0                 LISTENING       9480

however the ApplicationMaster cannot connect and reported error

Caused by: java.net.NoRouteToHostException: No route to host

because this ip is a VM bridge on my development machine

3 example of ip bind

.config('spark.driver.host','172.18.1.194')
.config('spark.driver.bindAddress','192.168.132.1')

then netstat -ano

TCP    172.18.1.194:63937     172.18.7.101:8032      ESTABLISHED     17412
TCP    172.18.1.194:63940     172.18.7.102:9000      ESTABLISHED     17412
TCP    172.18.1.194:63952     172.18.7.121:50010     ESTABLISHED     17412
TCP    192.168.132.1:63923    0.0.0.0:0              LISTENING       17412
TCP    [::]:4040              [::]:0                 LISTENING       17412

Detailed Version

Before explain in detail, there are only these three related conf variables:

spark.driver.host
spark.driver.port
spark.driver.bindAddress

There are NO variables like spark.driver.hostname or spark.local.ip. But there IS a environment variable called SPARK_LOCAL_IP

and before explain the variables, first we have to understand the application submition process

Main Roles of computers:

development machine
master node (YARN / Spark Master)
worker node

There is an ApplicationMaster for each application, which takes care of resource request from cluster and status monitor of jobs(stages)

The ApplicationMaster is in the cluster, always.

Place of spark Driver

development machine: client mode
within the cluster: cluster mode, same place as the ApplicationMaster

Let's say we are talking about client mode

The spark application can be submitted from a development machine, which act as a client machine of the application, as well as a client machine of the cluster.

The spark application can alternatively submitted from a node within the cluster (master node or worker node or just a specific machine with no resource manager role)

The client machine might not be placed within the same subnet as the cluster and this is one case that these variables try to deal with. Think about your internet connection, it is often not possible that your laptop can be accessed from anywhere around the globe just as google.com.

At the beginning of the application submission process, the spark-submit on the client side would upload necessary files to the spark master or yarn, and negotiate about resource requests. In this step the client connect to the cluster, and the cluster address is the destination address that the client tries to connect.

Then the ApplicationMaster starts on the allocated resource.

The resource allocated for ApplicationMaster is by default random, and cannot control by these variables. It is controlled by the scheduler of the cluster, if you're curious about this.

Then the ApplicationMaster tries to connect BACK to the spark Driver. This is the place that these conf variables take effects.

What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?

Tags:

apache-spark

xyz_scala

1 Answers

Short Version

by examples

1 example of port binding

2 example of specify ip from ApplicationMaster to Driver

3 example of ip bind

Detailed Version

cdarlint

Recent Activity

Donate For Us

What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?

Tags:

apache-spark

xyz_scala

1 Answers

Short Version

by examples

1 example of port binding

2 example of specify ip from ApplicationMaster to Driver

3 example of ip bind

Detailed Version

cdarlint

Related questions

Recent Activity

Donate For Us