Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change yarn scheduler configuration on aws EMR?

Unlike HortonWorks or Cloudera, AWS EMR does not seem to give any GUI to change xml configurations of various hadoop ecosystem frameworks.

Logging into my EMR namenode and doing a quick

find \ -iname yarn-site.xml

I was able to find it to be located at /etc/hadoop/conf.empty/yarn-site.xml and capacity-scheduler to be located at /etc/hadoop/conf.empty/capacity-scheduler.xml.

But note how these are under conf.empty and I suspect these might not be the actual locations for yarn-site and capacity-scheduler xmls.

I understand that I can change these configurations while making a cluster but what I need to know is how to be able to change them without tearing apart the cluster.

I just want to play around scheduling properties and such and try out different schedulers to identify what might work will with my spark applications.

Thanks in advance!

like image 617
Kumar Vaibhav Avatar asked Apr 14 '17 02:04

Kumar Vaibhav


People also ask

Where is yarn site XML?

In the directory containing your hadoop installation navigate to share/doc/hadoop/hadoop-yarn/hadoop-yarn-common . As you should know, yarn-default. xml serves as the documentation for the default values, as compared to yarn-site. xml which represents your custom configuration values.

How do you automatically close EMR cluster?

Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/ . Choose Create cluster. Choose Go to advanced options. Under Add steps (optional) select Auto-terminate cluster after the last step is completed.

Is EMR ephemeral?

Amazon EBS volumes attached to Amazon EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination (for example, when shrinking instance groups), so it's important that you not expect data to persist.


1 Answers

Well, the yarn-site.xml and capacity-scheduler.xml are indeed under correct locations (/etc/hadoop/conf.empty/) and on running cluster , editing them on master node and restarting YARN RM Daemon will change the scheduler.

When spinning up a new cluster , you can use EMR Configurations API to change appropriate values. http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

For example : Specify appropriate values in capacity-scheduler and yarn-site classifications on your Configuration for EMR to change those values in corresponding XML files.

Edit: Sep 4, 2019 : With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.

Please see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html

like image 105
jc mannem Avatar answered Oct 12 '22 12:10

jc mannem