Unlike HortonWorks or Cloudera, AWS EMR does not seem to give any GUI to change xml configurations of various hadoop ecosystem frameworks.
Logging into my EMR namenode and doing a quick
find \ -iname yarn-site.xml
I was able to find it to be located at /etc/hadoop/conf.empty/yarn-site.xml
and capacity-scheduler to be located at /etc/hadoop/conf.empty/capacity-scheduler.xml
.
But note how these are under conf.empty and I suspect these might not be the actual locations for yarn-site and capacity-scheduler xmls.
I understand that I can change these configurations while making a cluster but what I need to know is how to be able to change them without tearing apart the cluster.
I just want to play around scheduling properties and such and try out different schedulers to identify what might work will with my spark applications.
Thanks in advance!
In the directory containing your hadoop installation navigate to share/doc/hadoop/hadoop-yarn/hadoop-yarn-common . As you should know, yarn-default. xml serves as the documentation for the default values, as compared to yarn-site. xml which represents your custom configuration values.
Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/ . Choose Create cluster. Choose Go to advanced options. Under Add steps (optional) select Auto-terminate cluster after the last step is completed.
Amazon EBS volumes attached to Amazon EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination (for example, when shrinking instance groups), so it's important that you not expect data to persist.
Well, the yarn-site.xml
and capacity-scheduler.xml
are indeed under correct locations (/etc/hadoop/conf.empty/
) and on running cluster , editing them on master node and restarting YARN RM Daemon will change the scheduler.
When spinning up a new cluster , you can use EMR Configurations API
to change appropriate values. http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
For example : Specify appropriate values in capacity-scheduler and yarn-site classifications
on your Configuration for EMR to change those values in corresponding XML files.
Edit: Sep 4, 2019 : With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.
Please see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With