Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Command on EMR Slaves?

I'm trying to update a running EMR cluster with pip install on all the slave machines. How can I do that?

I can't do it with a bootstrap step because it is a long running EMR and I can't take it down.

The EMR cluster is running Spark & Yarn, so I would normally use spark slaves.sh, but I can't find that script on the master node. Is it installed in a place I haven't found? Or is there some way to install it?

I've seen other questions that say use yarn distributed-shell, but I can't find a working example of how to do that.

BTW, the cluster is using EMR 4.8.0, Spark 1.6.1, I believe.

like image 361
enigmaticdatajunkie Avatar asked Nov 30 '16 20:11

enigmaticdatajunkie


People also ask

How do I run jobs on EMR cluster?

The Run Job on an Elastic MapReduce Cluster template launches an Amazon EMR cluster based on the parameters provided and starts running steps based on the specified schedule. Once the job completes, the EMR cluster is terminated.

How do I run an EMR code?

Submit a custom JAR step to run a script or commandjar on Amazon EMR. When you use command-runner. jar , you specify commands, options, and values in your step's list of arguments. The following AWS CLI example submits a step to a running cluster that invokes command-runner.


1 Answers

You can run yarn command from your nodes to get the list of all nodes and you might use SSH to run commands on all those nodes. Like in the article mentioned before, you can run something like

#Copy ssh key(like ssh_key.pem) of the cluster to master node.
aws s3 cp s3://bucket/ssh_key.pem ~/

# change permissions to read 
chmod 400 ssh_key.pem

# Run a PIP command
yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no -i ~/ssh_key.pem hadoop@{} "pip install package"
like image 61
jc mannem Avatar answered Jan 05 '23 02:01

jc mannem