I launch Spark in standalone mode on my remote server via following next steps:
cp spark-env.sh.template spark-env.sh
spark-env.sh
SPARK_MASTER_HOST=IP_OF_MY_REMOTE_SERVER
sbin/start-master.sh
sbin/start-slave.sh spark://IP_OF_MY_REMOTE_SERVER:7077
And I try to connect to remote master:
val spark = SparkSession.builder()
.appName("SparkSample")
.master("spark://IP_OF_MY_REMOTE_SERVER:7077")
.getOrCreate()
And I receive the following errors:
ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries!
and warnings:
WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
.....
WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7092.
I advise against submitting spark jobs remotely using the port opening strategy, because it can create security problems and is in my experience, more trouble than it's worth, especially due to having to troubleshoot the communication layer.
Alternatives:
1) Livy - now an Apache project! http://livy.io or http://livy.incubator.apache.org/
2) Spark Job server - https://github.com/spark-jobserver/spark-jobserver
Similar Q&A: Submitting jobs to Spark EC2 cluster remotely
If you insist on connecting without libraries like Livy, then opening the ports to ensure communication is required. The Spark network comm docs: http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security
Since you're not using YARN (per your Standalone design), the prior link to YARN remote submission may not be relevant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With