I'm trying to update a running EMR cluster with pip install on all the slave machines. How can I do that?
I can't do it with a bootstrap step because it is a long running EMR and I can't take it down.
The EMR cluster is running Spark & Yarn, so I would normally use spark slaves.sh, but I can't find that script on the master node. Is it installed in a place I haven't found? Or is there some way to install it?
I've seen other questions that say use yarn distributed-shell, but I can't find a working example of how to do that.
BTW, the cluster is using EMR 4.8.0, Spark 1.6.1, I believe.
The Run Job on an Elastic MapReduce Cluster template launches an Amazon EMR cluster based on the parameters provided and starts running steps based on the specified schedule. Once the job completes, the EMR cluster is terminated.
Submit a custom JAR step to run a script or commandjar on Amazon EMR. When you use command-runner. jar , you specify commands, options, and values in your step's list of arguments. The following AWS CLI example submits a step to a running cluster that invokes command-runner.
You can run yarn command from your nodes to get the list of all nodes and you might use SSH to run commands on all those nodes. Like in the article mentioned before, you can run something like
#Copy ssh key(like ssh_key.pem) of the cluster to master node.
aws s3 cp s3://bucket/ssh_key.pem ~/
# change permissions to read
chmod 400 ssh_key.pem
# Run a PIP command
yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no -i ~/ssh_key.pem hadoop@{} "pip install package"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With