Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connecting to remote master on standalone Spark

I launch Spark in standalone mode on my remote server via following next steps:

  • cp spark-env.sh.template spark-env.sh
  • append to spark-env.sh SPARK_MASTER_HOST=IP_OF_MY_REMOTE_SERVER
  • and run next commands for standalone mode: sbin/start-master.sh sbin/start-slave.sh spark://IP_OF_MY_REMOTE_SERVER:7077

And I try to connect to remote master:

val spark = SparkSession.builder()
  .appName("SparkSample")
  .master("spark://IP_OF_MY_REMOTE_SERVER:7077")
  .getOrCreate()

And I receive the following errors:

ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries!

and warnings:

    WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
.....
    WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7092.
like image 920
pacman Avatar asked Aug 15 '17 20:08

pacman


1 Answers

I advise against submitting spark jobs remotely using the port opening strategy, because it can create security problems and is in my experience, more trouble than it's worth, especially due to having to troubleshoot the communication layer.

Alternatives:

1) Livy - now an Apache project! http://livy.io or http://livy.incubator.apache.org/

2) Spark Job server - https://github.com/spark-jobserver/spark-jobserver

Similar Q&A: Submitting jobs to Spark EC2 cluster remotely

If you insist on connecting without libraries like Livy, then opening the ports to ensure communication is required. The Spark network comm docs: http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security

Since you're not using YARN (per your Standalone design), the prior link to YARN remote submission may not be relevant.

like image 183
Garren S Avatar answered Sep 19 '22 11:09

Garren S