Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

What is the difference between setting the mapred.job.map.memory.mb and mapred.child.java.opts using -Xmx to control the maximum memory used by a Mapper and Reduce task? Which one takes precedence?

like image 933
Praveen Sripati Avatar asked Nov 06 '11 14:11

Praveen Sripati


1 Answers

-Xmx specify the maximum heap space of the allocated jvm. This is the space reserved for object allocation that is managed by the garbage collector. On the other hand, mapred.job.map.memory.mb specifies the maximum virtual memory allowed by a Hadoop task subprocess. If you exceed the max heap size, the JVM throws an OutOfMemoryException.

The JVM may use more memory than the max heap size because it also needs space to store object definitions (permgen space) and the stack. If the process uses more virtual memory than mapred.job.map.memory.mb it is killed by hadoop.

So one doesn't take precedence over the other (and they measure different aspects of memory usage), but -Xmx is a parameter to the JVM and mapred.job.map.memory.mb is a hard upper-bound of the virtual memory a task attempt can use, enforced by hadoop.

Hope this is helpful, memory is complicated! I'm presently confused by why my JVM processes use several multiples of the max heap size in virtual memory in my SO post.

like image 80
schmmd Avatar answered Oct 13 '22 17:10

schmmd