So far I have run Spark only on Linux machines and VMs (bridged networking) but now I am interesting on utilizing more computers as slaves. It would be handy to distribute a Spark Slave Docker container on computers and having them automatically connecting themselves to a hard-coded Spark master ip. This short of works already but I am having trouble configuring the right SPARK_LOCAL_IP (or --host parameter for start-slave.sh) on slave containers.
I think I correctly configured the SPARK_PUBLIC_DNS env variable to match the host machine's network-accessible ip (from 10.0.x.x address space), at least it is shown on Spark master web UI and accessible by all machines.
I have also set SPARK_WORKER_OPTS and Docker port forwards as instructed at http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html, but in my case the Spark master is running on an other machine and not inside Docker. I am launching Spark jobs from an other machine within the network, possibly also running a slave itself.
Things that I've tried:
I wonder why isn't the configured SPARK_PUBLIC_DNS being used when connecting to slaves? I thought SPARK_LOCAL_IP would only affect on local binding but not being revealed to external computers.
At https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html they instruct to "set SPARK_LOCAL_IP to a cluster-addressable hostname for the driver, master, and worker processes", is this the only option? I would avoid the extra DNS configuration and just use ips to configure traffic between computers. Or is there an easy way to achieve this?
Edit: To summarize the current set-up:
I think I found a solution for my use-case (one Spark container / host OS):
--net host
with docker run
=> host's eth0 is visible in the containerSPARK_PUBLIC_DNS
and SPARK_LOCAL_IP
to host's ip, ignore the docker0's 172.x.x.x addressSpark can bind to the host's ip and other machines communicate to it as well, port forwarding takes care of the rest. DNS or any complex configs were not needed, I haven't thoroughly tested this but so far so good.
Edit: Note that these instructions are for Spark 1.x, at Spark 2.x only SPARK_PUBLIC_DNS
is required, I think SPARK_LOCAL_IP
is deprecated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With