Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YARN: Containers and JVM

Can someone help me understand the relation between JVM and containers in YARN?

  • How JVMs are created, is it one JVM for each task? can multiple tasks run in the same JVM at the same time? (I'm aware of ubertasking where many tasks (maps/reduce) can run in same JVM one after the other).
  • Is it one JVM for each container? or multiple containers in a single JVM? or there is no relation between JVM and containers?
  • when a resource manager allocates containers for a job, does multiple tasks inside the same job use same container for tasks running in same node? or separate containers for each task based on availability?

pointers to some useful links will also be helpful.

like image 934
MikA Avatar asked Feb 10 '17 10:02

MikA


People also ask

What is the default yarn Vmem-PMEM-ratio?

In YARN, there is the option yarn.nodemanager.vmem-pmem-ratio that is set to 2.1 by default. If you allocate relatively small containers at ~1 GB this ratio can be low and you may often face the "Container is running beyond virtual memory limits" errors.

How much virtual memory does yarn container use?

Container is running beyond virtual memory limits. Current usage: 1.0 GB of 1.1 GB physical memory used; 2.9 GB of 2.4 GB virtual memory used. Killing container. So what is the virtual memory, how to solve such errors and why is the virtual memory size often so large? Let’s find a YARN container and investigate its memory usage:

How much memory does the JVM use on a container?

As the JVM is not aware of the memory restrictions applied on the container, in the absence of any external tuning parameters, it will use approximately 1 4 of the available memory for the heap space, which comes out to be around ~1.5-2GB. We can verify the same using the -XX:+PrintFlagsFinal:

How to launch a container from a YARN application?

On Linux environment the secure container executor is the LinuxContainerExecutor. It uses an external program called the container-executor to launch the container. This program has the setuid access right flag set which allows it to launch the container with the permissions of the YARN application user.


1 Answers

Is it one JVM for each container? or multiple containers in a single JVM? or there is no relation between JVM and containers?

Of course there exists a relation and it's one-to-one. For each container that needs to be created, a new java process(JVM) is spawned.

Now, if you are not running in uber mode, consider following:-

How JVMs are created, is it one JVM for each task? can multiple tasks run in the same JVM at the same time? (I'm aware of ubertasking where many tasks (maps/reduce) can run in same JVM one after the other).

See, tasks are scheduled to run on some node in the cluster. According to requirements(memory and cpu) of task, the capacity of a container is decided. Also there are certain parameters for this which you can find in links below.
Each task attempt is scheduled on a JVM.

when a resource manager allocates containers for a job, does multiple tasks inside the same job use same container for tasks running in same node? or separate containers for each task based on availability?

Separate containers for each task are spawned based on resource availability in the cluster.

Here are some links which very are helpful-
http://ercoppa.github.io/HadoopInternals/AnatomyMapReduceJob.html
https://blog.cloudera.com/blog/2015/09/untangling-apache-hadoop-yarn-part-1/
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

like image 157
Shubham Chaurasia Avatar answered Sep 29 '22 14:09

Shubham Chaurasia