Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does an EMR master node know its cluster ID?

Tags:

I want to be able to create EMR clusters, and for those clusters to send messages back to some central queue. In order for this to work, I need to have some sort of agent running on each master node. Each one of those agents will have to identify itself in this message so that the recipient knows which cluster the message is about.

Does the master node know its ID (j-*************)? If not, then is there some other piece of identifying information that could allow the message recipient to infer this ID?

I've taken a look through the config files in /home/hadoop/conf, and I haven't found anything useful. I found the ID in /mnt/var/log/instance-controller/instance-controller.log, but it looks like it'll be difficult to grep for. I'm wondering where instance-controller might get that ID from in the first place.

like image 849
bstempi Avatar asked Nov 26 '13 20:11

bstempi


People also ask

What are the limitations of EMR cluster with multiple master nodes?

Limitations of an EMR cluster with multiple master nodes: If any two master nodes fail simultaneously, Amazon EMR cannot recover the cluster. Amazon EMR clusters with multiple master nodes are not tolerant to Availability Zone failures.

Which node in EMR manages the cluster?

The master node manages the cluster and typically runs master components of distributed applications.

Can a cluster have multiple master nodes?

You can launch an EMR cluster with multiple master nodes in both public and private VPC subnets.

How do I check my EMR cluster status?

View cluster status using the AWS CLI You can use the describe-cluster command to view cluster-level details including status, hardware and software configuration, VPC settings, bootstrap actions, instance groups, and so on. For more information about cluster states, see Understanding the cluster lifecycle.


1 Answers

You may look at /mnt/var/lib/info/ on Master node to find lot of info about your EMR cluster setup. More specifically /mnt/var/lib/info/job-flow.json contains the jobFlowId or ClusterID.

You can use the pre-installed json parser (jq) to get the jobflow id.

cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId" 

(updated as per @Marboni)

like image 115
jc mannem Avatar answered Sep 24 '22 04:09

jc mannem