Limit YARN containers programmatically

Tags:

I have 10 nodes in hadoop cluster with 32GB RAM and one with 64GB.

For these 10 nodes node limit yarn.nodemanager.resource.memory-mb is set to 26GB and for 64GB node to 52GB (have some jobs that require 50GB for single reducer, they run on this node)

The problem is, when I run basic jobs that require say 8GB for mapper, 32GB nodes spawn 3 mappers in parallel (26 / 8 = 3) and 64GB node spawns 6 mappers. This node usually finishes last, because of CPU load.

I'd like to limit job container resources programmatically, e.g. set container limit to 26GB for most of the jobs. How can it be done?

504

asked Jul 05 '17 11:07

AdamSkywalker

1 Answers

First of all yarn.nodemanager.resource.memory-mb (Memory) , yarn.nodemanager.resource.cpu-vcores (vcore) are Nodemanager daemon/service configuration properties and cannot be overriden in the YARN client applications. You need to restart nodemanager services if you change these configuration properties.

Since CPU is the bottleneck in your case, My recommendation is to change the YARN scheduling strategy to Fairscheduler with DRF (Dominant Resource Fairness) scheduling policy in the cluster level so that you will get the flexibility to specify application container size in terms of both memory and cpu core. Number of running application containers(mapper/reducer/AM/tasks) will be based on the available vcores that you define

Scheduling policy can be set at the Fair scheduler queue/pool level.

schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf”

See this apache doc for more details -

Once you have created new Fair scheduler queue/pool with DRF scheduling policy, both memory can cpu core can be set in the program as follows.

Configuration conf = new Configuration();

How to define container size in a mapreduce application.

Configuration conf = new Configuration();

conf.set("mapreduce.map.memory.mb","4096");
conf.set(mapreduce.reduce.memory.mb","4096");

conf.set(mapreduce.map.cpu.vcores","1");
conf.set(mapreduce.reduce.cpu.vcores","1");

Reference - https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Default value of cpu.vcores allocation for mapper/reducer will be 1, You can increase this value if it's a cpu intensive application. Remember If you increase this value, number of mapper/reducer tasks running in parallel also will be reduced.

answered Sep 25 '22 04:09

SachinJ

Related questions
                            
                                Intellij Idea doesn't see values in .yaml file, but successfully references .properties in @Value
                            
                                Jackson - Deserialize Array without property name
                            
                                Running a subset of unit tests when source file changes using Gradle
                            
                                Since which java version SHA-256 and SHA256withRSA are supported for timestamp at signed jar files
                            
                                NoSuchFieldError "ADJUST_DATES_TO_CONTEXT_TIME_ZONE" when trying to parse json
                            
                                Spring @Transactional read-only mode rollback behaviour
                            
                                springboot read tomcat-context.xml
                            
                                JavaFX Application (that uses a preloaded) exits prematurely
                            
                                Java ByteBuffer BigEndian Double
                            
                                PayPal SDK going from payment review page to profilepage
                            
                                libgdx: IOS on screen keyboard not firing events consistently
                            
                                Invalid character found in method name. HTTP method must be tokens
                            
                                intellij GWT debug configuration
                            
                                Hibernate "Column --- cannot be null
                            
                                Does the timezone matter when parsing a Timestamp?
                            
                                Glide Library Loading Very Slow with GIF
                            
                                SQL syntax error using jdbc
                            
                                Does the JSSE in Oracle JDK8 implements TLS Fallback SCSV?
                            
                                How to check if rectangular node is in the window
                            
                                Unable to start solr with Java 9

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Limit YARN containers programmatically

Tags:

java

hadoop

mapreduce

hadoop-yarn

AdamSkywalker

People also ask

1 Answers

SachinJ

Recent Activity

Donate For Us