Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue - can't set spark.yarn.executor.memoryOverhead

When running a python job in AWS Glue I get the error:

Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead

When running this in the beginning of the script:

print '--- Before Conf --'
print 'spark.yarn.driver.memory', sc._conf.get('spark.yarn.driver.memory')
print 'spark.yarn.driver.cores', sc._conf.get('spark.yarn.driver.cores')
print 'spark.yarn.executor.memory', sc._conf.get('spark.yarn.executor.memory')
print 'spark.yarn.executor.cores', sc._conf.get('spark.yarn.executor.cores')
print "spark.yarn.executor.memoryOverhead", sc._conf.get("spark.yarn.executor.memoryOverhead")

print '--- Conf --'
sc._conf.setAll([('spark.yarn.executor.memory', '15G'),('spark.yarn.executor.memoryOverhead', '10G'),('spark.yarn.driver.cores','5'),('spark.yarn.executor.cores', '5'), ('spark.yarn.cores.max', '5'), ('spark.yarn.driver.memory','15G')])

print '--- After Conf ---'
print 'spark.driver.memory', sc._conf.get('spark.driver.memory')
print 'spark.driver.cores', sc._conf.get('spark.driver.cores')
print 'spark.executor.memory', sc._conf.get('spark.executor.memory')
print 'spark.executor.cores', sc._conf.get('spark.executor.cores')
print "spark.executor.memoryOverhead", sc._conf.get("spark.executor.memoryOverhead")

I get following output:

--- Before Conf --

spark.yarn.driver.memory None

spark.yarn.driver.cores None

spark.yarn.executor.memory None

spark.yarn.executor.cores None

spark.yarn.executor.memoryOverhead None

--- Conf --

--- After Conf ---

spark.yarn.driver.memory 15G

spark.yarn.driver.cores 5

spark.yarn.executor.memory 15G

spark.yarn.executor.cores 5

spark.yarn.executor.memoryOverhead 10G

It seems like the spark.yarn.executor.memoryOverhead is set but why is it not recognized? I still get the same error.

I have seen other posts regarding problems with setting the spark.yarn.executor.memoryOverhead but not when it seems to be set and not working?

like image 444
KDilla Avatar asked Aug 23 '18 13:08

KDilla


People also ask

How do you increase Spark in yarn executor memoryOverHead?

Use the --conf option to increase memory overhead when you run spark-submit. If increasing the memory overhead doesn't solve the problem, then reduce the number of executor cores.

How do I give executor memory in Spark?

According to the recommendations which we discussed above: Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3. Memory per executor = 64GB/3 = 21GB.

How do I increase executor memory in AWS glue?

Scaling the Apache Spark driver and Apache Spark executors Vertical scaling: You can also use Glue's G. 1X and G. 2X worker types that provide more memory and disk space to vertically scale your Glue jobs that need high memory or disk space to store intermediate shuffle output.


2 Answers

  • Open Glue > Jobs > Edit your Job > Script libraries and job parameters (optional) > Job parameters near the bottom

  • Set the following > key: --conf value: spark.yarn.executor.memoryOverhead=1024

like image 171
HarshMarshmallow Avatar answered Oct 05 '22 23:10

HarshMarshmallow


Unfortunately the current version of the Glue doesn't support this functionality. You cannot set other parameters than using UI. In your case, instead of using AWS Glue, you can use AWS EMR service.

When I had the similar problem I tried to reduce the number of shuffles and the amount of data shuffled, and increase DPU. During the work on this problem I based on the following articles. I hope they will be useful.

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/

https://www.indix.com/blog/engineering/lessons-from-using-spark-to-process-large-amounts-of-data-part-i/

https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/sparksqlshufflepartitions_draft.html


Updated: 2019-01-13

Amazon added lately new section to AWS Glue documentation which describes how to monitor and optimize Glue jobs. I think it is very useful thing to understand where is the problem related to memory issue and how to avoid it.

https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html

like image 35
j.b.gorski Avatar answered Oct 05 '22 23:10

j.b.gorski