In the docs it's said that AWS allocates by default 10 DPUs per ETL job and 5 DPUs per development endpoint by default, even though both can have a minimum of 2 DPUs configured.
It's also mentioned that Crawling is also priced on second increments and with a 10 minute minimum run, but nowhere is specified how many DPUs are allocated. Jobs and Development Endpoints can be configured in the Glue console to consume less DPUs, but I haven't seen any such configuration for the crawlers.
Is there a fixed amount of DPUs per crawler? Can we control that amount?
Right now there is no way to configure DPU memory, but you can request a limit increase on your account to be able to use more DPUs. Show activity on this post. As of April 2019, there are two new types of workers: You can now specify a worker type for Apache Spark jobs in AWS Glue for memory intensive workloads.
Apache Spark and Spark Streaming job runs require a minimum of 2 DPU. By default, AWS Glue allocates 10 DPU to each Apache Spark job and 2 DPU to each streaming job. Jobs using AWS Glue version 0.9 or 1.0 have a 10-minute minimum billing duration, while jobs that use Glue versions 2.0 and later have a 1-minute minimum.
This is my conversation with AWS Support about this subject:
Hello, I'd like to know how many DPUs a crawler uses in order to calculate my costs with crawlers.
Their answer:
Dear AWS Customer,
Thank you for reaching out today. My name is Safari, I will assist with your case.
I understand that while compiling the cost of your Glue crawlers, you'd like to know the amount of DPUs a particular crawler uses.
Unfortunately, there is no direct way to find out the DPU consumption by a given crawler. I apologize for the inconvenience. However, you may see the total DPU consumption across all crawlers in your detailed bill under the section AWS Service Charges > Glue > {region} > AWS Glue CrawlerRun. Additionally, you can add tags to your crawlers and then enable "Cost Allocation Tags" from your AWS Billing and Cost Management console. This would allow AWS to generate a cost allocation report grouped by the predefined tags. For more on this, please see the documentation link below [1].
I hope this helps. Please let me know if I can provide you with any other assistance.
References [1]: https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html
Discussed with AWS support team as well, and currently its not possible to modify or view the DPU configuration details for Glue - crawlers. But, does crawlers use a DPU?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With