What is the difference between setting the mapred.job.map.memory.mb and mapred.child.java.opts using -Xmx to control the maximum memory used by a Mapper and Reduce task? Which one takes precedence?
-Xmx
specify the maximum heap space of the allocated jvm. This is the space reserved for object allocation that is managed by the garbage collector. On the other hand, mapred.job.map.memory.mb
specifies the maximum virtual memory allowed by a Hadoop task subprocess. If you exceed the max heap size, the JVM throws an OutOfMemoryException.
The JVM may use more memory than the max heap size because it also needs space to store object definitions (permgen space) and the stack. If the process uses more virtual memory than mapred.job.map.memory.mb
it is killed by hadoop.
So one doesn't take precedence over the other (and they measure different aspects of memory usage), but -Xmx
is a parameter to the JVM and mapred.job.map.memory.mb
is a hard upper-bound of the virtual memory a task attempt can use, enforced by hadoop.
Hope this is helpful, memory is complicated! I'm presently confused by why my JVM processes use several multiples of the max heap size in virtual memory in my SO post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With