With the release of Spark 2.0 today they have removed native support for launching a Spark EC2 cluster on AWS:
https://spark.apache.org/releases/spark-release-2-0-0.html#removals-behavior-changes-and-deprecations
Spark EC2 script has been fully moved to an external repository hosted by the UC Berkeley AMPLab
On the AMPLab GitHub page it includes these instructions:
https://github.com/amplab/spark-ec2/tree/branch-2.0#launching-a-cluster
Go into the ec2 directory in the release of Apache Spark you downloaded.
The problem is there is no ec2 folder in the 2.0 download. Anyone know how I can launch a Spark 2.0 cluster in EC2?
Thanks in advance.
LAST EDIT
For anyone having this issue, the answer is simpler: here.
EDIT 2
I realized after first edit that it is slightly more convoluted, so here's a new edit about for anyone that might find it useful in the future.
The issue is that Spark does no longer provide the ec2 directory as part of the official distribution. If you're used to spinning up your standalone clusters this way it is an issue.
The solution is simple:
spark-ec2
executable to mimic the way things worked in Spark 1.*, you will be able to spin up your cluster as usual. But when you ssh into it you'll realize that none of the binaries are there anymore.spark-ec2
you downloaded in step 1), you'll have to rsync
your local directory containing Spark 2.0.0 into the master of your newly created cluster. Once this is done, you can spark-submit
jobs as you normally do. Really simple but it seems to me the Spark docs could be clear about this for all of us normies.
EDIT: This was in fact the right thing to do. For anyone having the same question: download the ec2 dir from AMPLab like Spark suggests, put this folder inside your local Spark-2.0.0 dir, and fire-up scripts as usual. Apparently they only decoupled the directory for maintenance purposes, but the logic is still the same. Would be nice to have a few words about it in the Spark docs.
I tried the following: cloned the spark-ec2-branch-1.6 directory from the AMPLab link into my spark-2.0.0 directory, and attempted to launch a cluster with the usual ./ec2/spark-ec2
command. Maybe that's what they want us to do?
I'm launchng a small 16 node cluster. I can see it in the AWS dashboard but the terminal has been stuck printing the usual SSH error for the past... almost two hours.
Warning: SSH connection error. (This could be temporary.)
Host: ec2-54-165-25-18.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-54-165-25-18.compute-1.amazonaws.com port 22: Connection refused
Will update if I find anything useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With