I found that AWS Glue set up executor's instance with memory limit to 5 Gb --conf spark.executor.memory=5g
and some times, on a big datasets it fails with java.lang.OutOfMemoryError
. The same is for driver instance --spark.driver.memory=5g
.
Is there any option to increase this value?
Scaling the Apache Spark driver and Apache Spark executors Vertical scaling: You can also use Glue's G. 1X and G. 2X worker types that provide more memory and disk space to vertically scale your Glue jobs that need high memory or disk space to store intermediate shuffle output.
Maximum capacityChoose an integer from 2 to 100. The default is 10. This job type cannot have a fractional DPU allocation. For AWS Glue version 2.0 or later jobs, you cannot instead specify a Maximum capacity.
According to the Glue API docs, the max you can allocate per Job execution is 100 DPUs. AllocatedCapacity – Number (integer). The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10.
despite aws documentation stating that the --conf
parameter should not be passed, our AWS support team told us to pass --conf spark.driver.memory=10g
which corrected the issue we were having
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With