Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS EMR perform "bootstrap" script on all the already running machines in cluster

I have one EMR cluster which is running 24/7. I can't turn it off and launch the new one.

What I would like to do is to perform something like bootstrap action on the already running cluster, preferably using Python and boto or AWS CLI.

I can imagine doing this in 2 steps:

1) run the script on all the running instances (It would be nice if that would be somehow possible for example from boto)

2) adding the script to bootstrap actions for case that I'd like to resize the cluster.

So my question is: Is something like this possible using boto or at least AWS CLI? I am going through the documentation and source code on github, but I am not able to figure out how to add new "bootstrap" actions when the cluster is already running.

like image 950
ziky90 Avatar asked Oct 26 '14 17:10

ziky90


People also ask

When can bootstrap actions scripts can be defined in EMR?

Bootstrap actions are scripts that run on cluster after Amazon EMR launches the instance using the Amazon Linux Amazon Machine Image (AMI). Bootstrap actions run before Amazon EMR installs the applications that you specify when you create the cluster and before cluster nodes begin processing data.

What is AWS bootstrapping?

Bootstrapping is the deployment of a AWS CloudFormation template to a specific AWS environment (account and region). The bootstrapping template accepts parameters that customize some aspects of the bootstrapped resources (see Customizing bootstrapping).

Which instance does Amazon EMR use as the nodes of the cluster?

Because Spot Instances are often used to run task nodes, Amazon EMR has default functionality for scheduling YARN jobs so that running jobs do not fail when task nodes running on Spot Instances are terminated. Amazon EMR does this by allowing application master processes to run only on core nodes.

What is a bootstrap script?

Bootstrap scripts allow installation, management, and configuration of tools useful for cluster monitoring and data loading. A node bootstrap script runs on all cluster nodes, including autoscaling nodes, when they come up. In an AWS cluster, the script is called a user-data script.


1 Answers

Late answer, but I'll give it a shot:

That is going to be tough.

You could install Amazon SSM Agent and use the remote command interface to launch a command on all instances. However, you will have to assign the appropriate SSM roles to the instances, which will require rebuilding the cluster AFAIK. However, any future commands will not require rebuilding.

You would then be able to use the CLI to run commands on all nodes (probably boto as well, haven't checked that).

like image 121
Chris Chambers Avatar answered Sep 28 '22 06:09

Chris Chambers