Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue executor memory limit

I found that AWS Glue set up executor's instance with memory limit to 5 Gb --conf spark.executor.memory=5g and some times, on a big datasets it fails with java.lang.OutOfMemoryError. The same is for driver instance --spark.driver.memory=5g. Is there any option to increase this value?

like image 402
Alexey Bakulin Avatar asked Feb 28 '18 16:02

Alexey Bakulin


People also ask

How do I increase executor memory in AWS Glue?

Scaling the Apache Spark driver and Apache Spark executors Vertical scaling: You can also use Glue's G. 1X and G. 2X worker types that provide more memory and disk space to vertically scale your Glue jobs that need high memory or disk space to store intermediate shuffle output.

What is maximum capacity in AWS Glue?

Maximum capacityChoose an integer from 2 to 100. The default is 10. This job type cannot have a fractional DPU allocation. For AWS Glue version 2.0 or later jobs, you cannot instead specify a Maximum capacity.

How much data can glue process?

According to the Glue API docs, the max you can allocate per Job execution is 100 DPUs. AllocatedCapacity – Number (integer). The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10.


1 Answers

despite aws documentation stating that the --conf parameter should not be passed, our AWS support team told us to pass --conf spark.driver.memory=10g which corrected the issue we were having

like image 168
xtreampb Avatar answered Sep 21 '22 05:09

xtreampb