I want to be able to create EMR clusters, and for those clusters to send messages back to some central queue. In order for this to work, I need to have some sort of agent running on each master node. Each one of those agents will have to identify itself in this message so that the recipient knows which cluster the message is about.
Does the master node know its ID (j-*************
)? If not, then is there some other piece of identifying information that could allow the message recipient to infer this ID?
I've taken a look through the config files in /home/hadoop/conf
, and I haven't found anything useful. I found the ID in /mnt/var/log/instance-controller/instance-controller.log
, but it looks like it'll be difficult to grep for. I'm wondering where instance-controller might get that ID from in the first place.
Limitations of an EMR cluster with multiple master nodes: If any two master nodes fail simultaneously, Amazon EMR cannot recover the cluster. Amazon EMR clusters with multiple master nodes are not tolerant to Availability Zone failures.
The master node manages the cluster and typically runs master components of distributed applications.
You can launch an EMR cluster with multiple master nodes in both public and private VPC subnets.
View cluster status using the AWS CLI You can use the describe-cluster command to view cluster-level details including status, hardware and software configuration, VPC settings, bootstrap actions, instance groups, and so on. For more information about cluster states, see Understanding the cluster lifecycle.
You may look at /mnt/var/lib/info/
on Master node to find lot of info about your EMR cluster setup. More specifically /mnt/var/lib/info/job-flow.json
contains the jobFlowId or ClusterID.
You can use the pre-installed json parser (jq
) to get the jobflow id.
cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId"
(updated as per @Marboni)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With