Spark: executor memory exceeds physical limit

Question

My input dataset is about 150G. I am setting

--conf spark.cores.max=100 
--conf spark.executor.instances=20 
--conf spark.executor.memory=8G 
--conf spark.executor.cores=5 
--conf spark.driver.memory=4G

but since data is not evenly distributed across executors, I kept getting

Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used

here are my questions:

1. Did I not set up enough memory in the first place? I think 20 * 8G > 150G, but it's hard to make perfect distribution, so some executors will suffer
2. I think about repartition the input dataFrame, so how can I determine how many partition to set? the higher the better, or?
3. The error says "9 GB physical memory used", but i only set 8G to executor memory, where does the extra 1G come from?

Thank you!

Ryan Widmaier · Accepted Answer

When using yarn, there is another setting that figures into how big to make the yarn container request for your executors:

spark.yarn.executor.memoryOverhead

It defaults to 0.1 * your executor memory setting. It defines how much extra overhead memory to ask for in addition to what you specify as your executor memory. Try increasing this number first.

Also, a yarn container won't give you memory of an arbitrary size. It will only return containers allocated with a memory size that is a multiple of it's minimum allocation size, which is controlled by this setting:

yarn.scheduler.minimum-allocation-mb

Setting that to a smaller number will reduce the risk of you "overshooting" the amount you asked for.

I also typically set the below key to a value larger than my desired container size to ensure that the spark request is controlling how big my executors are, instead of yarn stomping on them. This is the maximum container size yarn will give out.

nodemanager.resource.memory-mb

Spark: executor memory exceeds physical limit

Tags:

apache-spark

spark-dataframe

user2628641

1 Answers

Ryan Widmaier

Recent Activity

Donate For Us

Spark: executor memory exceeds physical limit

Tags:

apache-spark

spark-dataframe

user2628641

1 Answers

Ryan Widmaier

Related questions

Recent Activity

Donate For Us